SlideShare a Scribd company logo
SS
C
ha
Simple RegressionSimple Regression
pter
Chapter ContentsChapter Contents
12
12.1 Visual Displays and Correlation Analysis12.1 Visual
Displays and Correlation Analysisp y yp y y
12.2 Simple Regression12.2 Simple Regression
12 3 Regression Terminology12 3 Regression Terminology12.3
Regression Terminology12.3 Regression Terminology
12.4 Ordinary Least Squares Formulas12.4 Ordinary Least
Squares Formulas
12 T f Si ifi12 T f Si ifi12.5 Tests for Significance12.5 Tests
for Significance
12.6 Analysis of Variance: Overall Fit12.6 Analysis of
Variance: Overall Fit
12.7 Confidence and Prediction Intervals for 12.7 Confidence
and Prediction Intervals for YY
12-1
SS
C
ha
Simple RegressionSimple Regression
pter
Chapter ContentsChapter Contents
12
12 8 Residual Tests12 8 Residual Tests12.8 Residual Tests12.8
Residual Tests
12.9 Unusual Observations12.9 Unusual Observations
12 10 Oth R i P bl12 10 Oth R i P bl12.10 Other Regression
Problems12.10 Other Regression Problems
12-2
C
ha
SS
pter 1
Simple RegressionSimple Regression
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
12
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO12LO12--1: 1: Calculate and test a correlation Calculate and
test a correlation coefficient coefficient for for
significancesignificance..
LO12LO12--2: 2: Interpret Interpret the slope and intercept of a
regression equation.the slope and intercept of a regression
equation.
LO12LO12--3: 3: Make Make a prediction for a given a
prediction for a given x value using a x value using a
regressionregression
equationequation..qq
LO12LO12--4: 4: Fit a simple regression on an Excel scatter
plot.Fit a simple regression on an Excel scatter plot.
LO12LO12--5:5: Calculate and interpretCalculate and interpret
confidenceconfidence intervals forintervals for
regressionregressionLO12LO12 5: 5: Calculate and interpret
Calculate and interpret confidence confidence intervals for
intervals for regressionregression
coefficientscoefficients..
LO12LO12 6:6: Test hypotheses about the slope and intercept
by usingTest hypotheses about the slope and intercept by using t
testst tests
12-3
LO12LO12--6: 6: Test hypotheses about the slope and intercept
by using Test hypotheses about the slope and intercept by using
t tests.t tests.
C
ha
ff
pter
Analysis of VarianceAnalysis of Variance
Ch t L i Obj ti (LO’ )Ch t L i Obj ti (LO’ )
12
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
LO12LO12--7:7: Perform regression with Excel or other
software.Perform regression with Excel or other software.
LO12LO12--8:8: Interpret the standard errorInterpret the
standard error RR22 ANOVA table and F testANOVA table and
F testLO12LO12 8: 8: Interpret the standard error, Interpret
the standard error, RR , ANOVA table, and F test., ANOVA
table, and F test.
LO12LO12--9:9: Distinguish between confidence and prediction
intervals.Distinguish between confidence and prediction
intervals.
LO12LO12 1010 T t id l f i l ti f i tiT t id l f i l ti f i
tiLO12LO12--10:10: Test residuals for violations of regression
assumptions.Test residuals for violations of regression
assumptions.
LO12LO12--11:11: Identify unusual residuals and highIdentify
unusual residuals and high--leverage observations.leverage
observations.
12-4
12 1 Visual12 1 Visual Displays andDisplays and
C
ha
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis
pter 1
Visual DisplaysVisual Displays
12
•• Begin the analysis of Begin the analysis of bivariate
databivariate data (i.e., two variables) with a (i.e., two
variables) with a
scatter plotscatter plot..
A tt l tA tt l t•• A scatter plot A scatter plot
-- displays each observed data pair (displays each observed data
pair (xxii, , yyii) as a dot on an ) as a dot on an X/YX/Y
grid.grid.
-- indicates visually the strength of the relationship between
theindicates visually the strength of the relationship between
theindicates visually the strength of the relationship between
the indicates visually the strength of the relationship between
the
two variables.two variables.
Sample Scatter Plot
12-5
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis 12
LO12LO12--1: 1: Calculate and test a correlation coefficient for
significance.Calculate and test a correlation coefficient for
significance.
Correlation CoefficientCorrelation Coefficient
•• The sample correlation coefficient (r) measures the•• The
sample correlation coefficient (r) measures the
degree of linearity in the relationship between X and Y.
-1 ≤ r ≤ +1
r = 0 indicates no linear
relationship
12-6
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation AnalysisLO12LO12--11
Scatter Plots Showing Various Correlation ValuesScatter Plots
Showing Various Correlation Values
12
Strong Positive Correlation Weak Positive Correlation Weak
Negative Correlation
12-7Strong Negative Correlation No Correlation Nonlinear
Relation
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis
correlation = 0)= 0 (population correlation = 0)
12
•• Step 1:Step 1: State the HypothesesState the Hypotheses
Determine whether you are using a one or twoDetermine
whether you are using a one or two--tailed test and the tailed
test and the
•• Step 2:Step 2: Specify the Decision RuleSpecify the Decision
Rule
For degrees of freedom For degrees of freedom df df = = nn --2,
Appendi DAppendi DAppendix D.Appendix D.
•• Note: r is an estimate of the population
12-8
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis
Steps in Testi
correlation = 0)= 0 (population correlation = 0)
12
•• Step 3:Step 3: Calculate the Test StatisticCalculate the Test
Statistic
•• Step 4: Step 4: Make the DecisionMake the Decision
If the sample correlation coefficientIf the sample correlation
coefficient rr exceeds the critical valueexceeds the critical value
rr ,,If the sample correlation coefficient If the sample
correlation coefficient rr exceeds the critical value exceeds the
then reject then reject HH00..
If using the If using the tt statistic method, reject statistic
--
12-9
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis
Critical Value for Correlation Coefficient (Critical Value for
Correlation Coefficient (Tests for Significance)Tests for
Significance)
12
•• Equivalently, you can calculate the critical value for the
correlation Equivalently, you can calculate the critical value for
the correlation
coefficient usingcoefficient using
•• This method gives a benchmark for the correlation
coefficient.This method gives a benchmark for the correlation
coefficient.gg
•• However, there is no However, there is no pp--value and is
inflexible if you change your value and is inflexible if you
change your
• MegaStat uses this method, giving two-tail critical values for
12-10
C
ha
12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis 12
12-11
C
ha
12 2 Si l R i12 2 Si l R i
pter 1
12.2 Simple Regression12.2 Simple Regression
What is Simple Regression?What is Simple Regression?
12
• Simple Regression analyzes the relationship between two
variables.
It ifi d d t ( ) i bl d• It specifies one dependent (response)
variable and one
independent (predictor) variable.
• This hypothesized relationship here will be linear• This
hypothesized relationship here will be linear.
12-12
C
ha
12 2 Si l R i12 2 Si l R i pter 1
12.2 Simple Regression12.2 Simple RegressionLO12LO12--22
LO12LO12 2:2: Interpret the slope and intercept of a regression
equationInterpret the slope and intercept of a regression
equation
Interpreting an Estimated Regression Equation:
ExamplesInterpreting an Estimated Regression Equation:
Examples
12LO12LO12--2: 2: Interpret the slope and intercept of a
regression equation.Interpret the slope and intercept of a
regression equation.
12-13
C
ha
12 2 Si l R i12 2 Si l R i pter 1
12.2 Simple Regression12.2 Simple RegressionLO12LO12--33
LO12LO12 33
Prediction Using Regression: ExamplesPrediction Using
Regression: Examples
12LO12LO12--3: 3: Make a prediction for a given Make a
prediction for a given x value using a x value using a regression
equation.regression equation.
g g pg g p
12-14
C
ha
12 2 Si l R i12 2 Si l R i
pter 1
12.2 Simple Regression12.2 Simple Regression
NOTES:NOTES:
12
12-15
C
ha
12 3 R i T i l12 3 R i T i l
M d l d P tM d l d P t
pter 1
12.3 Regression Terminology12.3 Regression Terminology
Model and ParametersModel and Parameters
12
• The assumed model for a linear relationship is
• The relationship holds for all pairs (xi , yi ).
independently
normally distributed with mean of 0 and standard deviation
• The unknown parameters are:
12-16
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
12.3 Regression Terminology12.3 Regression Terminology
Model and ParametersModel and Parameters
12
•• The The fitted model fitted model oror regression model
regression model is used to predict the is used to predict the
expectedexpected value of value of YY for a given value of for
a given value of XX isis
•• TheThe fitted coefficientsfitted coefficients areareThe The
fitted coefficientsfitted coefficients areare
b0 the estimated intercept
b1 the estimated slope
12-17
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
LO12LO12--44 12.3 Regression Terminology12.3 Regression
Terminology
12
LO12LO12--4: 4: Fit a simple regression on an Excel scatter
plot.Fit a simple regression on an Excel scatter plot.
A more precise method is to let Excel
calculate the estimates. We enter
observations on the independent
variable x1, x2, . . ., xn and the
dependent variable y1, y2, . . ., yn into
separate columns and let Excel fi t theseparate columns, and let
Excel fi t the
regression equation, as illustrated in
Figure 12.6. Excel will choose the
regression coefficients so as to
produce a good fi t
12-18
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
LO12LO12--44 12.3 Regression Terminology12.3 Regression
Terminology
12
Slope and Intercept InterpretationsSlope and Intercept
Interpretations
• Figure 12 6 (previous slide) shows a sample of miles per
gallon and• Figure 12.6 (previous slide) shows a sample of
miles per gallon and
horsepower for 15 engines. The Excel graph and its fitted
regression
equation are also shown.
• Slope Interpretation: The slope of -0.0785 says that for each
additional
unit of engine horsepower, the miles per gallon decreases by
0.0785 mile.
This estimated slope is a statistic because a different sample
might yield aThis estimated slope is a statistic because a
different sample might yield a
different estimate of the slope.
• Intercept Interpretation: The intercept value of 49.216
suggests that when
the engine has no horsepower , the fuel efficiency would be
quite high.the engine has no horsepower , the fuel efficiency
would be quite high.
However, the intercept has little meaning in this case, not only
because zero
horsepower makes no logical sense, but also because
extrapolating to x = 0
is beyond the range of the observed data.y g
12-19
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Slope and InterceptSlope and Intercept
12
•• The The ordinary least squaresordinary least squares method
(method (OLSOLS) estimates the slope ) estimates the slope
d i t t f th i li th t th f id l id i t t f th i li th t th f id l iand
intercept of the regression line so that the sum of residuals is
and intercept of the regression line so that the sum of residuals
is
minimized.minimized.
•• The sum of the residuals = 0The sum of the residuals = 0••
The sum of the residuals = 0.The sum of the residuals = 0.
•• The sum of the squared residuals is The sum of the squared
residuals is SSE.SSE.
12-20
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
ThTh OLSOLS ti t f th l iti t f th l i
Slope and InterceptSlope and Intercept
12
•• The The OLSOLS estimator for the slope is:estimator for the
slope is:
oror
ThTh OLSOLS ti t f th i t t iti t f th i t t i•• The The OLSOLS
estimator for the intercept is:estimator for the intercept is:
12-21
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Slope and InterceptSlope and Intercept
12
12-22
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Assessing FitAssessing Fit
12
•• We want to explain the total variation in We want to explain
the total variation in YY around its mean (around its mean
(SSTSST for for
Total Sums of SquaresTotal Sums of Squares).).
•• The regression sum of squares (The regression sum of squares
(SSRSSR) is the ) is the explained variation explained variation
in in Y.Y.
12-23
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Th f (Th f (SSESSE) i th) i th l i d i til i d i ti ii YY
Assessing FitAssessing Fit
12
•• The error sum of squares (The error sum of squares (SSESSE)
is the ) is the unexplained variationunexplained variation in in
Y.Y.
•• If the fit is good, If the fit is good, SSESSE will be relatively
small compared to will be relatively small compared to
SSTSST..
A perfect fit is indicated by anA perfect fit is indicated by an
SSESSE = 0= 0•• A perfect fit is indicated by an A perfect fit is
indicated by an SSE SSE = 0.= 0.
•• The magnitude of The magnitude of SSESSE depends on
depends on nn and on the units of and on the units of
measurement.measurement.measurement.measurement.
12-24
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Coefficient of DeterminationCoefficient of Determination
RR22 i fi f l ti fitl ti fit b d i fb d i f SSRSSR dd
12
•• RR22 is a measure of is a measure of relative fitrelative fit
based on a comparison of based on a comparison of SSR SSR
and and
SSTSST..
•• Often expressed as a percent, an Often expressed as a
percent, an RR22 = 1 (i.e., 100%) indicates = 1 (i.e., 100%)
indicates
perfect fit.perfect fit.
•• In simple regression, In simple regression, RR2 2 = (= (rr))22
12-25
C
hapter 1
12.5 Test For Significance12.5 Test For
SignificanceLO12LO12--55
12
LO12LO12--5: 5: Calculate and interpret confidence intervals
for regressionCalculate and interpret confidence intervals for
regression
coefficients.coefficients.
•• TheThe standard errorstandard error ((ss) is an overall
measure of model fit) is an overall measure of model fit
Standard Error of RegressionStandard Error of Regression
•• The The standard errorstandard error ((ss) is an overall
measure of model fit.) is an overall measure of model fit.
•• If the fitted model’s predictions are perfect If the fitted
model’s predictions are perfect e ed ode s p ed c o s a e pe ece
ed ode s p ed c o s a e pe ec
((SSESSE = 0), then = 0), then ss = 0. Thus, a small = 0. Thus,
a small ss indicates a better fit.indicates a better fit.
•• Used to construct confidence intervals. Used to construct
confidence intervals.
•• Magnitude of Magnitude of ss depends on the units of
measurement of depends on the units of measurement of YY and
on and on
data magnitude.data magnitude.
12-26
C
hapter 1
12.5 Test For Significance12.5 Test For
SignificanceLO12LO12--55
•• Standard error of the slope and intercept:Standard error of the
slope and intercept:
Confidence Intervals for Slope and InterceptConfidence
Intervals for Slope and Intercept
12
•• Standard error of the slope and intercept:Standard error of the
slope and intercept:
12-27
C
hapter 1
12.5 Test For Significance12.5 Test For
SignificanceLO12LO12--55
Confidence Intervals for Slope and InterceptConfidence
Intervals for Slope and Intercept
12
•• Confidence interval for the true slope and
interceptConfidence interval for the true slope and intercept::
•• Note: One can use Excel, Minitab, MegaStat or
other software to compute these intervalsp
and do hypothesis tests relating to linear regression.
12-28
C
hapter 1
12.5 Test For Significance12.5 Test For
SignificanceLO12LO12--66
12
LO12LO12--6: 6: Test hypotheses about the slope and intercept
by using Test hypotheses about the slope and intercept by using
t tests.t tests.
•• If If
influence YY and the regression model and the regression
model
Hypothesis TestsHypothesis Tests
random error.plus random error.
•• The hypotheses to be tested are:The hypotheses to be tested
are:The hypotheses to be tested are:The hypotheses to be tested
are:
df = n -2
or if or if pp--
12-29
C
ha
12 6 A l i f V i O ll Fit12 6 A l i f V i O ll Fit pter 1
12.6 Analysis of Variance: Overall Fit12.6 Analysis of
Variance: Overall FitLO12LO12--88
12
LO12LO12--8: 8: Interpret the standard error, Interpret the
standard error, RR22, ANOVA table, , ANOVA table, and and F
test.F test.
• To test a regression for overall significance, we use an F test
to
F F Test for Overall FitTest for Overall Fit
g g ,
compare the explained (SSR) and unexplained (SSE) sums of
squares.
12-30
12 7 Confidence12 7 Confidence and Predictionand Prediction
C
ha
12.7 Confidence 12.7 Confidence and Prediction and
Prediction
Intervals for Intervals for YY
pter 1
LO12LO12--99
H t C t t I t l E ti t f YH t C t t I t l E ti t f Y
12
LO12LO12--9:9: Distinguish between confidence and prediction
intervals for Y.Distinguish between confidence and prediction
intervals for Y.
C fid I t l f th diti l fditi l f YY
How to Construct an Interval Estimate for YHow to Construct
an Interval Estimate for Y
• Confidence Interval for the conditional mean of conditional
mean of Y.Y.
• Prediction intervals are wider than confidence intervals
because
individual Y values vary more than the mean off YYindividual
Y values vary more than the mean of f YY..
12-31
12 8 R id l T t12 8 R id l T t
C
ha
12.8 Residual Tests12.8 Residual Tests
pter 1
LO12LO12--1010
12
LO12LO12--10: 10: Test residuals for violations of regression
assumptions.Test residuals for violations of regression
assumptions.
Three Important AssumptionsThree Important Assumptions
11 The errors are normally distributedThe errors are normally
distributed1.1. The errors are normally distributed.The errors
are normally distributed.
2.2. The errors have constant variance (i.e., they are The errors
have constant variance (i.e., they are
homoscedastichomoscedastic).).
33 The errors are independent (i e they areThe errors are
independent (i e they are
nonautocorrelatednonautocorrelated))3.3. The errors are
independent (i.e., they are The errors are independent (i.e., they
are nonautocorrelatednonautocorrelated).).
NonNon--normal Errorsnormal Errors
•• NonNon--normalitynormality of errors is a mild violation
since the regression of errors is a mild violation since the
regression
parameter estimates parameter estimates bb00 and and bb11 and
their variances remain and their variances remain
bi d d i t tbi d d i t tunbiased and consistent.unbiased and
consistent.
•• Confidence intervals for the parameters may be untrustworthy
Confidence intervals for the parameters may be untrustworthy
because normality assumption is used to justify usingbecause
normality assumption is used to justify using
12-32
because normality assumption is used to justify using because
normality assumption is used to justify using
Student’s Student’s tt distribution.distribution.
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
NonNon--normal Errorsnormal Errors
A l l i ld tA l l i ld t
12
•• A large sample size would compensate.A large sample size
would compensate.
•• Outliers could pose serious problemsOutliers could pose
serious problems..
Normal Probability PlotNormal Probability Plot
•• The The Normal Probability PlotNormal Probability Plot tests
the assumptiontests the assumption
HH00: Errors are normally distributed: Errors are normally
distributed
HH : Errors are not normally distributed: Errors are not
normally distributedHH11: Errors are not normally distributed:
Errors are not normally distributed
•• If If HH00 is true, the is true, the
residual probability residual probability p yp y
plot should be linear plot
should be linear
as shown in the as shown in the example.example.
12-33
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
What to Do About NonWhat to Do About Non--
Normality?Normality?
12
1.1. Trim outliers only if they clearly are mistakes.Trim outliers
only if they clearly are mistakes.
2.2. Increase the sample size if possible.Increase the sample
size if possible.
3.3. Try a logarithmic transformation of both Try a logarithmic
transformation of both XX and and YY..
Heteroscedastic Errors (NonHeteroscedastic Errors (Non--
constant Variance)constant Variance)(( ))
•• The ideal condition is if the error magnitude is constant (i.e.,
The ideal condition is if the error magnitude is constant (i.e.,
errors are errors are homoscedastichomoscedastic).).
12-34
))
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
Heteroscedastic Errors (NonHeteroscedastic Errors (Non--
constant Variance)constant Variance)
12
•• HeteroscedasticHeteroscedastic errors increase or decrease
with errors increase or decrease with XX..
•• In the most common form ofIn the most common form of
heteroscedasticityheteroscedasticity the variances of thethe
variances of theIn the most common form of In the most
common form of heteroscedasticityheteroscedasticity, the
variances of the , the variances of the
estimators are likely to be understated.estimators are likely to
be understated.
•• This results in overstated This results in overstated tt
statistics and artificially narrow statistics and artificially
narrow yy
confidence intervals.confidence intervals.
Tests for HeteroscedasticityTests for HeteroscedasticityTests
for HeteroscedasticityTests for Heteroscedasticity
•• Plot the residuals against Plot the residuals against XX. . gg
Ideally, there is no pattern in the Ideally, there is no pattern in
the
residuals moving from left to right.residuals moving from left to
right.
12-35
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
Tests for HeteroscedasticityTests for Heteroscedasticity
Th “fTh “f t” tt f i i id l i i th tt” tt f i i id l i i th t
12
•• The “fanThe “fan--out” pattern of increasing residual
variance is the most out” pattern of increasing residual variance
is the most
common pattern indicating heteroscedasticity.common pattern
indicating heteroscedasticity.
12-36
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
What to Do About Heteroscedasticity?What to Do About
Heteroscedasticity?
12
•• Transform both Transform both XX and and YY, for example,
by taking logs., for example, by taking logs.
•• Although it can widen the confidence intervals for the
coefficients, Although it can widen the confidence intervals for
the coefficients,
heteroscedasticity does not bias the estimates.heteroscedasticity
does not bias the estimates.
Autocorrelated ErrorsAutocorrelated ErrorsAutocorrelated
ErrorsAutocorrelated Errors
• Autocorrelation is a pattern of non-independent errors.
• In a first-order autocorrelation, et is correlated with et-1.
• The estimated variances of the OLS estimators are biased,
resulting in confidence intervals that are too narrow, overstating
the
model’s fit.
12-37
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
Runs Test for AutocorrelationRuns Test for Autocorrelation
I thI th t tt t t th b f th id l’ i l (i ht th b f th id l’ i l (i h
12
•• In the In the runs testruns test, count the number of the
residual’s sign reversals (i.e., how , count the number of the
residual’s sign reversals (i.e., how
often does the residual cross the zero centerline?).often does the
residual cross the zero centerline?).
•• If the pattern is random, the number of sign changes should
be If the pattern is random, the number of sign changes should
be n/2n/2. . p , g gp , g g
•• Fewer than Fewer than n/2n/2 would suggest positive
autocorrelation.would suggest positive autocorrelation.
•• More than More than n/2n/2 would suggest negative
autocorrelation.would suggest negative autocorrelation.
DurbinDurbin--Watson (DW) TestWatson (DW) Test
• Tests for autocorrelation under the hypotheses
H0: Errors are non-autocorrelated
H : Errors are autocorrelatedH1: Errors are autocorrelated
• The DW statistic will range from 0 to 4.
DW < 2 suggests positive autocorrelation
12-38
DW = 2 suggests no autocorrelation (ideal)
DW > 2 suggests negative autocorrelation
C
ha
12 8 R id l T t12 8 R id l T t
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
What to Do About Autocorrelation?What to Do About
Autocorrelation?
T f b th i bl i thT f b th i bl i th th d f fi t diffth d f fi t diff ii
12
•• Transform both variables using the Transform both variables
using the method of first differencesmethod of first differences
in in
which both variables are redefined as which both variables are
redefined as changeschanges. . Then we regress Y
against X.against X.
•• Although it can widen the confidence interval for the
coefficients, Although it can widen the confidence interval for
the coefficients,
autocorrelation does not bias the estimates.autocorrelation does
not bias the estimates.au oco e a o does o b as e es a esau oco e
a o does o b as e es a es
12-39
12 9 U l12 9 U l Ob tiOb ti
C
ha
12.9 Unusual 12.9 Unusual ObservationsObservations
pter 1
LO12LO12--1111
12
LO12LO12--11: 11: Identify unusual residuals and high
leverage observations.Identify unusual residuals and high
leverage observations.
Standardized ResidualsStandardized Residuals
• One can use Excel Minitab MegaStat or other software to
compute• One can use Excel, Minitab, MegaStat or other
software to compute
standardized residuals.
• If the absolute value of any standardized residual is at least 2,
then it is y ,
classified as unusual.
Leverage and InfluenceLeverage and Influencegg
•• A high A high leverageleverage statistic indicates the
observation is far from the statistic indicates the observation is
far from the
mean of mean of XX. .
•• These observations are influential because they are at the “
end These observations are influential because they are at the “
end
of the lever.”of the lever.”
12-40
•• The leverage for observation The leverage for observation ii
is denoted is denoted hhii ..
C
ha
12 9 U l Ob ti12 9 U l Ob ti
pter 1
12.9 Unusual Observations12.9 Unusual
ObservationsLO12LO12--1111
Leverage Leverage
12
• A leverage that exceeds 3/n is unusual.g
12-41
C
ha
12.10 Other 12.10 Other Regression ProblemsRegression
Problems
pter 1
O tliO tli
12
OutliersOutliers
To fix the problem, To fix the problem,
-- delete the observation(s)delete the observation(s)
d l t th d td l t th d t
Outliers may be caused byOutliers may be caused by
-- an error in recordingan error in recording
-- delete the datadelete the data
-- formulate a multiple regression formulate a multiple
regression
model that includes the lurking model that includes the lurking
datadata
-- impossible data impossible data
-- an observation that hasan observation that has ode a c udes e
u gode a c udes e u g
variable.variable.
-- an observation that hasan observation that has
been influenced by an been influenced by an
unspecified “lurking”unspecified “lurking”
variable that shouldvariable that should
have been controlledhave been controlled
but wasn’tbut wasn’t
12-42
12B-42
but wasn t.but wasn t.
C
hapter 1
12.10 Other Regression Problems12.10 Other Regression
Problems
Model MisspecificationModel Misspecification
If l t di t h b itt d th th d l i
12
• If a relevant predictor has been omitted, then the model is
misspecified.
• Use multiple regression instead of bivariate regression• Use
multiple regression instead of bivariate regression.
IllIll Conditioned DataConditioned DataIllIll--Conditioned
DataConditioned Data
• Well-conditioned data values are of the same general order of
magnitude.
• Ill conditioned data have unusually large or small data values
and• Ill-conditioned data have unusually large or small data
values and
can cause loss of regression accuracy or awkward estimates.
12-43
C
hapter 1
12.10 Other Regression Problems12.10 Other Regression
Problems
IllIll--Conditioned DataConditioned Data
A id i i it d b dj ti th it d f d t
12
• Avoid mixing magnitudes by adjusting the magnitude of your
data
before running the regression..
Spurious CorrelationSpurious Correlation
• In a spurious correlation two variables appear related because
of
the way they are defined.
This problem is called the si e effect or problem of totals• This
problem is called the size effect or problem of totals.
12-44
C
hapter 1
12.10 Other Regression Problems12.10 Other Regression
Problems
Model Form and Variable TransformsModel Form and Variable
Transforms
S ti li d l i b tt fit th li d lS ti li d l i b tt fit th li d l
12
•• Sometimes a nonlinear model is a better fit than a linear
model. Sometimes a nonlinear model is a better fit than a linear
model.
•• Excel offers many model forms.Excel offers many model
forms.
Variables may be transformed (e g logarithmic or
exponentialVariables may be transformed (e g logarithmic or
exponential•• Variables may be transformed (e.g., logarithmic
or exponential Variables may be transformed (e.g., logarithmic
or exponential
functions) in order to provide a better fit.functions) in order to
provide a better fit.
•• Log transformations reduce heteroscedasticityLog
transformations reduce heteroscedasticityLog transformations
reduce heteroscedasticity.Log transformations reduce
heteroscedasticity.
•• Nonlinear models may be difficult to interpretNonlinear
models may be difficult to interpret..
12-45
C
hapter 112
12-46
C
hapter 112
12-47
SS
C
ha
TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests
pter
Chapter ContentsChapter Contents
10
10.1 Two10.1 Two--Sample TestsSample Tests
10.2 Comparing Two Means: Independent Samples10.2
Comparing Two Means: Independent Samples
10.3 Confidence Interval for the Difference of Two Means,
10.3 Confidence Interval for the Difference of Two Means,
--
10.4 Comparing Two Means: Paired Samples10.4 Comparing
Two Means: Paired Samples
10.5 Comparing Two Proportions10.5 Comparing Two
Proportions
10.6 Confidence Interval for the Difference of Two
Proportions, 10.6 Confidence Interval for the Difference of
Two Prop --
10.7 Comparing Two Variances10.7 Comparing Two Variances
10-1
C
ha
SS
pter 1
TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
10
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO10LO10 1:1: R i d f t t f t ith kR i d f t t f t ith kLO10LO10--
1: 1: Recognize and perform a test for two means with known
Recognize and perform a test for two means with known
LO10LO10 22LO10LO10--2: 2: Recognize and perform a test
for two means with unknown Recognize and perform a test for
two means with unknown
LO10LO10--3:3: Recognize paired data and be able to perform a
paired Recognize paired data and be able to perform a paired t
test.t test.
LO10LO10--4: 4: Explain the assumptions underlying the
twoExplain the assumptions underlying the two--sample test of
sample test of
means. means.
LO10LO10--5:5: Perform a test to compare two proportions
using Perform a test to compare two proportions using z.z.
10-2
C
ha
SS
pter
TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
10
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO10LO10--6: 6: Check whether normality may be assumed for
two Check whether normality may be assumed for two
proportions.proportions.
LO10LO10--7: 7: Use Excel to find Use Excel to find pp--
values for twovalues for two--sample tests using sample tests
using z z oror t.t.
LO10LO10--8: 8: Carry out a test of two variances using the
Carry out a test of two variances using the F F
distribution.distribution.y gy g
LO10LO10--99: Construct a confidence interval for Construct a
confidence interval for μμ11− − μμ22 or or ππ11− − ππ22
((optional).optional).((optional).optional).
10-3
C
ha
SS
pter
10.1 Two10.1 Two--Sample TestsSample Tests
•• A TwoA Two--sample test compares two sample estimates
with eachsample test compares two sample estimates with each
What is a TwoWhat is a Two--Sample TestSample Test
10
•• A TwoA Two--sample test compares two sample estimates
with each sample test compares two sample estimates with each
other.other.
•• A oneA one--sample test compares a sample estimate to a
nonsample test compares a sample estimate to a non--sample
sample p p pp p p pp
benchmark.benchmark.
Basis of TwoBasis of Two--Sample TestsSample TestsBasis of
TwoBasis of Two Sample TestsSample Tests
• Two-sample tests are especially useful because they possess a
built-in point of comparison.
•• The logic of twoThe logic of two--sample tests is based on
the fact that two sample tests is based on the fact that two
l d f th l ti i ld diff tl d f th l ti i ld diff tsamples drawn from the
same population may yield different samples drawn from the
same population may yield different
estimates of a parameter due to chance.estimates of a parameter
due to chance.
10-4
C
ha
SS
pter
10.1 Two10.1 Two--Sample TestsSample Tests
•• If the two sample statistics differ by more than the amountIf
the two sample statistics differ by more than the amount
What is a TwoWhat is a Two--Sample TestSample Test
10
•• If the two sample statistics differ by more than the amount If
the two sample statistics differ by more than the amount
attributable to chance, then we conclude that the samples came
attributable to chance, then we conclude that the samples came
from populations with different parameter values.from
populations with different parameter values.
10-5
C
ha
SS
pter
10.1 Two10.1 Two--Sample TestsSample Tests
Test ProcedureTest Procedure
10
•• State the hypothesesState the hypotheses
•• Set up the decision ruleSet up the decision rule
•• Insert the sample statisticsInsert the sample statistics
•• Make a decision based on the critical values or using Make a
decision based on the critical values or using pp--valuesvalues
10-6
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--11
10
pp
LO10LO10--1: 1: Recognize and perform a test for two means
with known Recognize and perform a test for two means
with known
σσ11 andand σσ22
Format of HypothesesFormat of Hypotheses
σσ11 and and σσ22..
• The hypotheses for comparing two independent population
ypyp
yp p g p p p
means µ1 and µ2 are:
10-7
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--11
Case 1: Known VariancesCase 1: Known Variances
10
pp
•• When the variances are known, use the normal distribution
for the When the variances are known, use the normal
distribution for the
Case 1: Known VariancesCase 1: Known Variances
,,
test (assuming a normal population).test (assuming a normal
population).
•• The test statistic is:The test statistic is:
10-8
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--22
10
LO10LO10--2: 2: Recognize and perform a test for two means
with unknown Recognize and perform a test for two
means with unknown
σσ11 andand σσ22
pp
Case 2: Unknown Variances, Assumed EqualCase 2: Unknown
Variances, Assumed Equal
σσ11 and and σσ22..
•• Since the variances are unknown, they must be estimated
Since the variances are unknown, they must be estimated
and the Student’s and the Student’s tt distribution used to test
the means.distribution used to test the means.
•• Assuming the population variances are equal, Assuming the
population variances are equal, ss1122 and and ss2222
can be used to estimate a common pooled variance can be used
to estimate a common pooled variance sspp22..
10-9
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--22
Case 3: Unknown Variances, Assumed UnequalCase 3:
Unknown Variances, Assumed Unequal
10
pp
10-10
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--22
Case 3: Unknown Variances, Assumed UnequalCase 3:
Unknown Variances, Assumed Unequal
10
pp
•• WelchWelch--Satterthwaite testSatterthwaite test
•• A Quick Rule for degrees of freedom is to use min(A Quick
Rule for degrees of freedom is to use min(nn11 –– 1, 1, nn22 ––
1). 1).
10-11
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
If th l ti i 2 d 2 k th th
Summary for the Test StatisticSummary for the Test Statistic
10
pp
the
normal distribution.
• If population variances are unknown and estimated using s12
andIf population variances are unknown and estimated using s1
and
s22, then use the Students t distribution.
10-12
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
Steps in Testing Two MeansSteps in Testing Two Means
10
pp
• Step 1: State the hypotheses
• Step 2: Specify the decision rulep p y
value(s).
• Step 3: Calculate the Test Statistic
•• Step 4Step 4: : Make the decision Reject Make the decision
Reject HH00 if the test statistic falls in the if the test statistic
falls in the pp jj 00
rejection region(s) as defined by the critical value(s).rejection
region(s) as defined by the critical value(s).
• Step 5: : Take action based on the decision. p
10-13
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
If th l i l thIf th l i l th C 2C 2 dd C 3C 3 t tt t
Which Assumption Is Best?Which Assumption Is Best?
10
pp
•• If the sample sizes are equal, the If the sample sizes are
equal, the Case 2Case 2 and and Case 3Case 3 test test
statistics will be identical, although the degrees of freedom may
statistics will be identical, although the degrees of freedom may
differ.differ.
•• If the variances are similar, the two tests will usually agree.If
the variances are similar, the two tests will usually agree.
•• If no information about the population variances is available,
then If no information about the population variances is
available, then p p ,p p ,
the best choice is the best choice is Case 3Case 3..
•• The fewer assumptions, the better.The fewer assumptions, the
better.
Must Sample Sizes Be Equal?Must Sample Sizes Be Equal?
•• Unequal sample sizes are common and the formulas still
applyUnequal sample sizes are common and the formulas still
apply•• Unequal sample sizes are common and the formulas still
apply.Unequal sample sizes are common and the formulas still
apply.
10-14
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
Large SamplesLarge Samples
10
pp
•• For unknown variances, if both samples are large (For
30 and
following 30) and the population is not badly skewed, use the
following
formula with appendix C.formula with appendix C.pppp
Caution: Three IssuesCaution: Three IssuesCaution: Three
IssuesCaution: Three Issues
1.1. Are the populations skewed? Are there outliers? Are the
populations skewed? Are there outliers?
Check using histograms and/or dot plots of each sample.
Check using histograms and/or dot plots of each sample.
tt tests are OK if moderately skewed, especially if samples are
tests are OK if moderately skewed, especially if samples are
10-15
y , p y py , p y p
large. Outliers are more serious.large. Outliers are more
serious.
C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
Caution: Three IssuesCaution: Three Issues
22 Are the sample sizes largeAre the sample sizes large (n(n
10
pp
2.2. Are the sample sizes large Are the sample sizes large (n(n
If samples are small, the mean is not a reliable indicator of
central If samples are small, the mean is not a reliable indicator
of central
tendency and the test may lack powertendency and the test may
lack powertendency and the test may lack power.tendency and
the test may lack power.
3.3. Is the difference Is the difference important important as
well as significant?as well as significant?
A ll diff i ti ld b i ifi t ifA ll diff i ti ld b i ifi t ifA small
difference in means or proportions could be significant if A
small difference in means or proportions could be significant if
the sample size is large.the sample size is large.
10-16
C
ha
10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f
pter
10.3 Confidence Interval for the Difference of 10.3
Confidence Interval for the Difference of
--
LO10LO10--99
10
LO10LO10--9: 9: Construct a confidence interval for Construct
--
((optional)optional)
Confidence Intervals for the Difference of Two Means
10-17
C
ha
10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f
pter
10.3 Confidence Interval for the Difference of 10.3
Confidence Interval for the Difference of
--
LO10LO10--99
10
LO10LO10--9: 9: Construct a confidence interval for Construct
--
((optional)optional)
10-18
C
ha
10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f
pter
10.3 Confidence Interval for the Difference of 10.3
Confidence Interval for the Difference of
--
LO10LO10--99
10
LO10LO10--9: 9: Construct a confidence interval for Construct
--
((optional)optional)
10-19
C
ha
10 4 Comparing Two Means:10 4 Comparing Two Means: pter
LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing
Two Means:
Paired SamplesPaired Samples
10
LO10LO10--3: 3: Recognize paired data and be able to perform
a paired Recognize paired data and be able to perform a paired t
test.t test.
Paired DataPaired Data
•• Data occurs in matched pairs when the same item is observed
Data occurs in matched pairs when the same item is observed
twice but under different circumstances.twice but under
different circumstances.
•• For example blood pressure is taken before and after a
treatmentFor example blood pressure is taken before and after a
treatmentFor example, blood pressure is taken before and after a
treatment For example, blood pressure is taken before and after
a treatment
is given.is given.
•• Paired data are typically displayed in columns.Paired data are
typically displayed in columns.
10-20
C
ha
10 4 Comparing Two Means:10 4 Comparing Two Means: pter
LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing
Two Means:
Paired SamplesPaired Samples
Paired t TestPaired t Test
•• Paired data typically come from a before/after
experimentPaired data typically come from a before/after
experiment
10
•• Paired data typically come from a before/after
experiment.Paired data typically come from a before/after
experiment.
•• In the paired In the paired tt test, the difference between test,
the difference between xx11 and and xx22 is measured is
measured
asas dd == xx11 –– xx22as as dd xx11 xx22
•• The mean and standard deviation for the differences d are
given The mean and standard deviation for the differences d are
given
below.below.
Th t t t ti ti i j t fTh t t t ti ti i j t f l tl t t tt t•• The test statistic
is just for a oneThe test statistic is just for a one--sample
tsample t--test.test.
10-21
C
ha
10 4 Comparing Two Means:10 4 Comparing Two Means: pter
10.4 Comparing Two Means: 10.4 Comparing Two Means:
Paired SamplesPaired Samples
LO10LO10--33
St 1 St t th h th f l
Steps in Testing Paired DataSteps in Testing Paired Data
10
• Step 1: State the hypotheses, for example
H0: µd = 0
H1: µd ≠ 01 µd
• Step 2: Specify the decision rule.
determine the critical
values from Appendix D or with use of technology.
St 3 C l l t th t t t ti tiSt 3 C l l t th t t t ti ti tt•• Step 3:
Calculate the test statistic Step 3: Calculate the test statistic tt
•• Step 4: Make the decisionStep 4: Make the decision
Reject Reject HH00 if the test statistic falls in the rejection
region(s) as if the test statistic falls in the rejection region(s) as
jj 00 j g ( )j g ( )
defined by the critical valuesdefined by the critical values
10-22
C
hapter
10.4 Comparing Two Means: 10.4 Comparing Two Means:
Paired SamplesPaired Samples
LO10LO10--33
A two tailed test for a zero difference is equivalent to asking
Analogy to Confidence IntervalAnalogy to Confidence Interval
10
pp
• A two-tailed test for a zero difference is equivalent to asking
whether the confidence interval for the true mean difference µd
includes zero.
10-23
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--55
10
LO10LO10--5: 5: Perform a test to compare two proportions
using Perform a test to compare two proportions using z.z.
--
following
hypotheses
10-24
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--55
10
--
Sample ProportionsSample Proportions
10-25
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--55
10
--
• If H0 is true, there is no difference between
Pooled ProportionPooled Proportion
0 ,
estimate the common population proportion.
10-26
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--55
T t St ti tiT t St ti ti
10
Testing for Zero
--
If th l l b d ll
Test StatisticTest Statistic
• If the samples are large, p1 – p2 may be assumed normally
distributed.
• The test statistic is the difference of the sample
proportionsThe test statistic is the difference of the sample
proportions
divided by the standard error of the difference.
• The standard error is calculated by using the pooled
proportion.y g p p p
-
10-27
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--55
10
--
Steps in Testing Two ProportionsSteps in Testing Two
Proportions
• Step 1: State the hypotheses
• Step 2: Specify the decision rulep p y
value(s).
use a
pooled estimate of the common proportion.
•• Step 4: Make the decision RejectStep 4: Make the decision
Reject HH if the test statistic falls in theif the test statistic falls
in the•• Step 4: Make the decision Reject Step 4: Make the
decision Reject HH00 if the test statistic falls in the if the test
statistic falls in the
rejection region(s) as defined by the critical value(s).rejection
region(s) as defined by the critical value(s).
10-28
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
ProportionsLO10LO10--66
10
LO10LO10--6: 6: Check whether normality may be assumed for
two proportions.Check whether normality may be assumed for
two proportions.
Testing for Zero Difference: Testing for Zero Difference:
--
• We have assumed a normal distribution for the statistic p1 –
p2.
Checking for NormalityChecking for Normality
p1 p2
• This assumption can be checked.
place
If ith l ti i t l th i diff t• If either sample proportion is not
normal, their difference cannot
safely be assumed normal.
• The sample size rule of thumb is equivalent to requiring that
each e sa p e s e u e o u b s equ a e o equ g a eac
sample contains at least 10 “successes” and at least 10
“failures.”
10-29
C
hapter
10.5 Comparing Two Proportions10.5 Comparing Two
Proportions
10
Testing for NonTesting for Non--Zero DifferenceZero
Difference
10-30
C
ha
10 6 C fid I t l f th Diff10 6 C fid I t l f th Diff
pter
10.6 Confidence Interval for the Difference 10.6 Confidence
Interval for the Difference
--
•• If the confidence interval does not include 0, then we will
reject If the confidence interval does not include 0, then we will
reject
the null hypothesis of no difference in the proportions.the null
hypothesis of no difference in the proportions.e u ypo es s o o d
e e ce e p opo o se u ypo es s o o d e e ce e p opo o s
10-31
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
F t f H thF t f H th
10LO10LO10--8: 8: Carry out a test of two variances using the
Carry out a test of two variances using the F F
distributiondistribution
•• To test whether two population means are equal, we may also
To test whether two population means are equal, we may also
need to test whether two population variances are equalneed to
test whether two population variances are equal
Format of HypothesesFormat of Hypotheses
need to test whether two population variances are equal.need to
test whether two population variances are equal.
10-32
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
•• The test statistic is the ratio of the sample variances:The test
statistic is the ratio of the sample variances:
The F TestThe F Test
10
•• The test statistic is the ratio of the sample variances:The test
statistic is the ratio of the sample variances:
• If the variances are equal, this ratio should be near unity: F =
1
10-33
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
The F TestThe F Test
10
• If the test statistic is far below 1 or above 1, we would reject
the
hypothesis of equal population variances.
• The numerator s12 has degrees of freedom df1 = n1 – 1 and
the
denominator s22 has degrees of freedom df2 = n2 – 1.
• The F distribution is skewed with the mean > 1 and its mode <
1• The F distribution is skewed with the mean > 1 and its
mode < 1.
10-34
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
The F Test: Critical ValuesThe F Test: Critical Values
10
• Critical values for the F test are denoted
FL (left tail) and FR (right tail). L ( ) R ( g )
• A right-tail critical value FR may be found from Appendix F
using
df1 and df2 degrees of freedom.
FR = Fdf1, df2
• A left-tail critical value FR may be found by reversing the
d d i d f f d fi di hnumerator and denominator degrees of
freedom, finding the
critical value from Appendix F and taking its reciprocal:
F = 1/FFL = 1/Fdf2, df1
10-35
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
The F Test: Critical ValuesThe F Test: Critical Values
10
10-36
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
Steps in Testing Two VariancesSteps in Testing Two Variances
10
• Step 1: State the hypotheses, for example
• Step 2: Specify the decision rule
D f f dDegrees of freedom are:
Numerator: df1 = n1 – 1
Denominator: df2 = n2 – 1 2 2
Choose a and find the left-tail and right-tail critical values from
Appendix F.
10-37
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
Steps in Testing Two VariancesSteps in Testing Two Variances
10
•• Step 3: Calculate the test statistic Step 3: Calculate the test
statistic FFcalccalc = = ss1122//ss2222..
•• Step 4: Make the decisionStep 4: Make the decision
Reject Reject HH00 if the test statistic falls in the rejection
regions as if the test statistic falls in the rejection regions as
defined by the critical values defined by the critical values
FFLL and and FFUU..
10-38
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
Comparison of Variances: One Tailed TestComparison of
Variances: One Tailed Test
10
• Step 1: State the hypotheses, for example
• Step 2: State the decision rulep
Degrees of freedom are:
Numerator: df1 = n1 – 1
D i t df 1Denominator: df2 = n2 – 1
Choose a and find the left-tail critical value from Appendix F.
10-39
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
Comparison of Variances: One Tailed TestComparison of
Variances: One Tailed Test
10
•• Step 3: Calculate the Test Statistic Step 3: Calculate the
Test Statistic FFcalccalc = = ss1122//ss2222..
•• Step 4: Make the decisionStep 4: Make the decisionStep 4:
Make the decisionStep 4: Make the decision
Reject Reject HH00 if the test statistic falls in the leftif the test
statistic falls in the left--tail rejection region as tail rejection
region as
defined by the critical value.defined by the critical value.
10-40
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
EXCEL’s F TestEXCEL’s F Test
10
10-41
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
Assumptions of the F TestAssumptions of the F Test
•• TheThe FF test assumes that the populations being sampled
aretest assumes that the populations being sampled are
10
•• The The FF test assumes that the populations being sampled
are test assumes that the populations being sampled are
normal.normal.
•• It is sensitive to nonIt is sensitive to non--normality of the
sampled populations.normality of the sampled populations.y p p
py p p p
•• MINITAB reports both the MINITAB reports both the FF test
and an alternative test and an alternative Levene’s testLevene’s
test
and and pp--values.values.
10-42
SS
C
ha
Sampling Distributions and EstimationSampling Distributions
and Estimation
pter
Chapter ContentsChapter Contents
8
8.1 Sampling Variation
8 2 E ti t d S li E8.2 Estimators and Sampling Errors
8.3 Sample Mean and the Central Limit Theorem
8 4 Confidence Interval for a Mean (μ) with Known σ8.4
Confidence Interval for a Mean (μ) with Known σ
8.5 Confidence Interval for a Mean (μ) with Unknown σ
8 6 Confidence Interval for a Proportion (π)8.6 Confidence
Interval for a Proportion (π)
8.7 Estimating from Finite Populations
8 8 Sample Size Determination for a Mean8.8 Sample Size
Determination for a Mean
8.9 Sample Size Determination for a Proportion
8.10 Confidence Interval for a Populati
(Optional)
8-1
(Optional)
C
ha
SS
pter
Sampling Distributions and EstimationSampling Distributions
and Estimation
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
8
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO8LO8 11LO8LO8--1: 1: Define sampling error, parameter,
and estimator.Define sampling error, parameter, and estimator.
LO8LO8--2: 2: Explain the desirable properties of
estimators.Explain the desirable properties of estimators.
LO8LO8--3:3: State the Central Limit Theorem for a mean.State
the Central Limit Theorem for a mean.
LO8LO8--4:4: Explain how sample size affects the standard
error.Explain how sample size affects the standard
error.LO8LO8 4:4: Explain how sample size affects the standard
error.Explain how sample size affects the standard error.
LO8LO8--5:5: Construct a 90, 95, or 99 percent confidence
interval for Construct a 90, 95, or 99 percent confidence
interval for μ.μ.
8-2
C
ha
SS
pter
Sampling Distributions and EstimationSampling Distributions
and Estimation
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
8
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO8LO8 66LO8LO8--6:6: Know when to use Student’s Know
when to use Student’s t t instead of instead of z z to estimate to
estimate μ.μ.
LO8LO8--7:7: Construct a 90, 95, or 99 percent confidence
interval for Construct a 90, 95, or 99 percent confidence
interval for π.π.
LO8LO8--8:8: Construct confidence intervals for finite
populations.Construct confidence intervals for finite
populations.
LO8LO8--9:9: Calculate sample size to estimate a mean or
proportion.Calculate sample size to estimate a mean or
proportion.LO8LO8 9:9: Calculate sample size to estimate a
mean or proportion.Calculate sample size to estimate a mean or
proportion.
LO8LO8--10: 10: Construct a confidence interval for a variance
(optional).Construct a confidence interval for a variance
(optional).
8-3
C
ha
8 1 S li V i ti8 1 S li V i ti
pter
8.1 Sampling Variation8.1 Sampling Variation
• Sample statistic – a random variable whose value depends on
8
which population items are included in the random sample.
• Depending on the sample size, the sample statistic could either
represent the pop lation ell or differ greatl from the pop
lationrepresent the population well or differ greatly from the
population.
• This sampling variation can easily be illustrated.
8-4
C
ha
8 1 S li V i ti8 1 S li V i ti
pter
8.1 Sampling Variation8.1 Sampling Variation
8
C id i ht d l f iC id i ht d l f i 5 f l5 f l•• Consider eight random
samples of size Consider eight random samples of size nn = 5
from a large = 5 from a large
population of GMAT scores for MBA applicants.population of
GMAT scores for MBA applicants.
•• The sample means tend to be close to the population mean
The sample means tend to be close to the population mean
8-5
C
ha
8 1 S li V i ti8 1 S li V i ti
pter
8.1 Sampling Variation8.1 Sampling Variation
•• The dot plots show that the sample The dot plots show that
the sample meansmeans have much less variation have much
less variation
than thethan the individualindividual sample items.sample
items.
8
than the than the individualindividual sample items. sample
items.
8-6
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--11
8
LO8LO8--1: 1: Define sampling error, parameter and
estimator.Define sampling error, parameter and estimator.
E ti tE ti t t ti ti d i d f l t i f th l ft ti ti d i d f l t i f th l f
Some TerminologySome Terminology
•• EstimatorEstimator –– a statistic derived from a sample to
infer the value of a a statistic derived from a sample to infer the
value of a
population parameter.population parameter.
•• EstimateEstimate –– the value of the estimator in a particular
samplethe value of the estimator in a particular
sampleEstimateEstimate –– the value of the estimator in a
particular sample.the value of the estimator in a particular
sample.
•• Population parameters are usually represented by Population
parameters are usually represented by
Greek letters and the corresponding statistic Greek letters and
the corresponding statistic p gp g
by Roman letters.by Roman letters.
8-7
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--11
Examples of EstimatorsExamples of Estimators
8
Sampling DistributionsSampling Distributions
• The sampling distribution of an estimator is the probability
distribution of
all possible values the statistic may assume when a random
sample of
size n is taken.
8-8
• Note: An estimator is a random variable since samples vary.
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--11
8
• Sampling errorSampling error is the difference between an
estimate and the
corresponding population parameter. For example, if we use the
sample
ti t f th l ti th th
BiasBias
mean as an estimate for the population mean, then the
• Bias is the difference between the expected value of the
estimator and
the true parameter Example for the mean
BiasBias
the true parameter. Example for the mean,
•• An estimator is An estimator is unbiasedunbiased if its
expected value is the parameter being if its expected value is
the parameter being
estimated. The sample mean is an unbiased estimator of the
population estimated. The sample mean is an unbiased
estimator of the population
iimean sincemean since
•• On averageOn average an unbiased estimator neither
overstates nor understatesan unbiased estimator neither
overstates nor understatesOn averageOn average, an unbiased
estimator neither overstates nor understates , an unbiased
estimator neither overstates nor understates
the true parameter.the true parameter.
8-9
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--11
8
8-10
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--22
8
LO8LO8--2: 2: Explain the desirable properties of
estimators.Explain the desirable properties of estimators.
EfficiencyEfficiency
Note: Also, a desirable property for an estimator is for it to be
unbiased.
•• EfficiencyEfficiency refers to the variance of the estimator’s
sampling refers to the variance of the estimator’s sampling
distribution.distribution.
Fi 8 6•• A A more efficientmore efficient estimator has smaller
variance.estimator has smaller variance.Figure 8.6
8-11
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--22
8
LO8LO8--2: 2: Explain the desirable properties of
estimators.Explain the desirable properties of estimators.
ConsistencyConsistency
A consistent estimator converges toward the parameter being
estimatedA consistent estimator converges toward the parameter
being estimated
as the sample size increases.
Fi 8 6Figure 8.6
8-12
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
8
LO8LO8--3: 3: State the Central Limit Theorem for a
mean.State the Central Limit Theorem for a mean.
The Central Limit Theorem is a powerful result that allows us to
i t th h f th li di t ib ti f th lapproximate the shape of the
sampling distribution of the sample
mean even when we don’t know what the population looks like.
8-13
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
•• If the population is exactly If the population is exactly
normal, then the sample meannormal, then the sample mean
8
•• As the sample size As the sample size nn increases, the
increases, the
distribution of sample means narrowsdistribution of sample
means narrowsnormal, then the sample mean normal, then the
sample mean
follows a normal distribution.follows a normal distribution.
distribution of sample means narrows distribution of
sample means narrows
in on the population mean in on the population mean µµ..
8-14
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
•• If the sample is large enough, the sample means will have If
the sample is large enough, the sample means will have
approximately a normal distribution even if your population is
approximately a normal distribution even if your population is
notnot
8
normal.normal.
8-15
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
Illustrations of Central Limit Theorem Illustrations of Central
Limit Theorem
8
Using the uniform
and a right skewed
di t ib ti
Note:
distribution.
8-16
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
Th C t l Li it Th it t d fi i t l ithi hi h
Applying The Central Limit TheoremApplying The Central
Limit Theorem
8
The Central Limit Theorem permits us to define an interval
within which
the sample means are expected to fall. As long as the sample
size n is
large enough, we can use the normal distribution regardless of
the
population shape (or any n if the population is normal to begin
with).
8-17
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--44
8
LO8LO8--4: 4: Explain how sample size affects the standard
error.Explain how sample size affects the standard error.
Even if the population standard deviation σ is large, the sample
means
Sample Size and Standard ErrorSample Size and Standard Error
p p g , p
will fall within a narrow interval as long as n is large. The key
is the
standard error of the mean:.. The standard error decreases as n
increasesincreases.
For example, when n = 4 the standard error is halved. To halve
it again
requires n = 16, and to halve it again requires n = 64. To halve
the
standard error, you must quadruple the sample size (the law of
diminishing returns).
8-18
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform
PopulationIllustration: All Possible Samples from a Uniform
Population
8
•• Consider a discrete uniform population consisting of the
integers Consider a discrete uniform population consisting of
the integers
{0 1 2 3}{0 1 2 3}{0, 1, 2, 3}.{0, 1, 2, 3}.
•• The population parameters are:The population parameters are:
1.118.= 1.118.
8-19
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform
PopulationIllustration: All Possible Samples from a Uniform
Population
8
• The population is uniform, yet the distribution of all possible
sample means of size 2 has a peaked triangular shapesample
means of size 2 has a peaked triangular shape.
8-20
C
ha
8 4 Confidence Interval for a Mean (8 4 Confidence Interval for
) with) with pter
8.4 Confidence Interval for a Mean (8.4 Confidence Interval
--55
8
LO8LO8--5: 5: Construct a 90, 95, or 99 percent confidence
interval for Construct a 90, 95, or 99 percent confidence
interval for μ.μ.
What is a Confidence Interval?What is a Confidence Interval?
8-21
C
ha
8 4 Confidence Interval for a Mean (8 4 Confidence Interval for
8.4 Confidence Interval for a Mean (8.4 Confidence Interval
--55
What is a Confidence Interval?What is a Confidence Interval?
with knownwith know
8
8-22
C
ha
8 4 Confidence Interval for a Mean (8 4 Confidence Interval for
8.4 Confidence Interval for a Mean (8.4 Confidence Interval
--55
A hi h fid l l l d t id fid i t lA hi h fid l l l d t id fid i t l
Choosing a Confidence LevelChoosing a Confidence Level
8
•• A higher confidence level leads to a wider confidence
intervalA higher confidence level leads to a wider confidence
interval..
•• Greater confidence Greater confidence
implies loss of precision implies loss of precision
(i t i f(i t i f(i.e. greater margin of (i.e. greater margin of
error).error).
•• 95% confidence is95% confidence is•• 95% confidence is
95% confidence is
most often used.most often used.
Confidence Intervals for Example 8.2
8-23
C
ha
8 4 Confidence Interval for a Mean (8 4 Confidence Interval for
8.4 Confidence Interval for a Mean (8.4 Confidence Interval
--55
•• A confidence interval either A confidence interval either
doesdoes or
InterpretationInterpretation
8
•• The confidence level quantifies the The confidence level
quantifies the riskrisk..
•• Out of 100 confidence intervals, approximately 95% Out of
100 confidence intervals, approximately 95% maymay contain
while approximately 5% while approximately 5% might
constructing 95%
confidence intervals.confidence intervals.
When Can We Assume Normality?When Can We Assume
Normality?
use the p p , y
formula to compute the confidence interval.
normal, a common
rule of th
formula as long as the
distribution
Is approximately symmetric with no outliers.
• Larger n may be needed to assume normality if you are
sampling from a strongly• Larger n may be needed to assume
normality if you are sampling from a strongly
skewed population or one with outliers.
8-24
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
pter
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
8
LO8LO8--6: 6: Know when to use Student’s Know when to use
Student’s t t instead of instead of zz to estimate to estimate
•• Use the Use the Student’s t distributionStudent’s t
distribution instead of the normal distribution instead of the
normal distribution
Student’s t DistributionStudent’s t Distribution
when the population is normal but the standard deviation when
unknown and the sample size is small.unknown and the sample
size is small.
8-25
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
with) with pter
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
8
LO8LO8--6: 6: Know when to use Student’s Know when to use
Student’s t t instead of instead of zz to estimate to estimate
Student’s t DistributionStudent’s t Distribution
8-26
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Student’s t DistributionStudent’s t Distribution
•• tt distributions are symmetric and shaped like the standard
normaldistributions are symmetric and shaped like the standard
normal
8
•• tt distributions are symmetric and shaped like the standard
normal distributions are symmetric and shaped like the standard
normal
distribution.distribution.
•• The The tt distribution is dependent on the size of the
sample.distribution is dependent on the size of the sample.p pp
p
Comparison of Normal and St dent’sComparison of Normal and
St dent’s tt
8-27Figure 8.11
Comparison of Normal and Student’s Comparison of Normal
and Student’s tt
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Degrees of FreedomDegrees of Freedom
•• Degrees of FreedomDegrees of Freedom ((d fd f ) is a
parameter based on the sample) is a parameter based on the
sample
8
•• Degrees of Freedom Degrees of Freedom ((d.fd.f.) is a
parameter based on the sample .) is a parameter based on the
sample
size that is used to determine the value of the size that is used
to determine the value of the tt statistic.statistic.
•• Degrees of freedom tell how many observations are used to
Degrees of freedom tell how many observations are used to g yg
y
calc
estimates used in , less the number of intermediate estimates
used in
the calculation. The d.f for the the calculation. The d.f for the
tt distribution in this case, is given distribution in this case, is
given
bb d fd f 11by by d.f.d.f. = = nn --1.1.
•• As As nn increases, the increases, the tt distribution
approaches the shape of the distribution approaches the shape of
the
l di t ib til di t ib tinormal distribution. normal distribution.
•• For a given confidence level, For a given confidence level, tt
is always larger than is always larger than zz, so a , so a
confidence interval based onconfidence interval based on tt is
always wider than ifis always wider than if zz were usedwere
usedconfidence interval based on confidence interval based on
tt is always wider than if is always wider than if zz were
used.were used.
8-28
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Comparison of z and tComparison of z and t
• For very small samples t-values differ substantially from the
8
• For very small samples, t-values differ substantially from the
normal.
• As degrees of freedom increase, the t-values approach the g ,
pp
normal z-values.
• For example, for n = 31, the degrees of freedom, d.f. = 31 – 1
=
30.
So for a 90 percent confidence interval, we would use
t = 1.697, which is only slightly larger than z = 1.645.
8-29
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
E l GMAT S A iE l GMAT S A i
pter
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Example GMAT Scores AgainExample GMAT Scores Again 8
8-30
Figure 8.13
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Example GMAT Scores AgainExample GMAT Scores Again
C t t 90% fid i t l f th GMAT fC t t 90% fid i t l f th GMAT f
8
•• Construct a 90% confidence interval for the mean GMAT
score of Construct a 90% confidence interval for the mean
GMAT score of
all MBA applicants.all MBA applicants.
x = 510 s = 73.77
•• Since Since
use the Student’s tt for the confidence interval for the
confidence interval
with with d.f.d.f. = 20 = 20 –– 1 = 19.1 = 19.
from Appendix D.from Appendix D.
8-31
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
Unknown (Unknown --66
•• For a 90% confidence For a 90% confidence
interval, use Appendixinterval, use Appendix
8
interval, use Appendix interval, use Appendix
D to find tD to find t0.050.05 = 1.729 = 1.729
with with d.f.d.f. = 19.= 19.
Note: One can use Excel,
Minitab, etc. to
obtain these values
as well as to
construct confidence
Intervals.
We are 90 percent confident
that the true mean GMAT
score might be within the
8-32
g
interval [481.48, 538.52]
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
--66
Confidence Interval WidthConfidence Interval Width
• Confidence interval width reflects
8
• Confidence interval width reflects
- the sample size,
- the confidence level and
- the standard deviation.
• To obtain a narrower interval and more precision
i th l i- increase the sample size or
- lower the confidence level (e.g., from 90% to 80%
confidence).
8-33
C
ha
8 5 Confidence Interval for a Mean (8 5 Confidence Interval for
8.5 Confidence Interval for a Mean (8.5 Confidence Interval
) with
--66
Using Appendix DUsing Appendix D
8
•• Beyond Beyond d.f. d.f. = 50, Appendix D shows = 50,
Appendix D shows d.f. d.f. in steps of 5 or 10.in steps of 5 or
10.
•• If the table does not give the exact degrees of freedom, use
the If the table does not give the exact degrees of freedom, use
the g g ,g g ,
tt--value for the next lower degrees of freedom.value for the
next lower degrees of freedom.
•• This is a conservative procedure since it causes the interval
to be This is a conservative procedure since it causes the
interval to be
slightly wider.slightly wider.
• A conservative statistician may use the t distribution for
confidence intervals when σ is unknown becauseconfidence
intervals when σ is unknown because
using z would underestimate the margin of error.
8-34
C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
--77
8
LO8LO8--7: 7: Construct a 90, 95, or 99 percent confidence
interval for Construct a 90, 95, or 99 percent confidence
interval for π.π.
•• A proportion is a mean of data whose only values are 0 or
1.A proportion is a mean of data whose only values are 0 or 1.
8-35
C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
--77
Applying the CLTApplying the CLT
8
•• The distribution of a sample proportion The distribution of a
sample proportion pp = = xx//n n is symmetric if is symmetric if
= .50
as , approaches symmetry as nn increases.increases.
8-36
C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
--77
When is it Safe to Assume Normality of p?When is it Safe to
Assume Normality of p?
8
•• Rule of Thumb: Rule of Thumb: The sample proportion The
sample proportion pp = = xx//nn may be assumed to may be
assumed to
d
nn(1(1--
Sample size to assume
normality:y
Table 8.9 8-37
C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
--77
8
••
unknown, the confidence interval for pp = = xx//nn
(assuming a large sample) is(assuming a large sample) is
8-38
C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
Interval --77
Example AuditingExample Auditing
8
8-39
C
hapter
8.7 Estimating from Finite Population8.7 Estimating from
Finite PopulationLO8LO8--88
8
LO8LO8--8: 8: Construct Confidence Intervals for Finite
PopulationsConstruct Confidence Intervals for Finite
Populations.
N = population size; n = sample size
8-40
C
hapter
8.8 Sample Size determination for a Mean8.8 Sample Size
determination for a MeanLO8LO8--99
8
LO8LO8--9: 9: Calculate sample size to estimate a mean or
proportionCalculate sample size to estimate a mean or
proportion.
•• To estimate a population mean with a precision of To
estimate a population mean with a precision of ++ E E
(allowable (allowable
error), you would need a sample of size. Now, error), you
would need a sample of size. Now,
8-41
C
hapter
8.8 Sample Size determination for a Mean8.8 Sample Size
determination for a MeanLO8LO8--99
8
•• Method 1: Method 1: Take a Preliminary SampleTake a
Preliminary Sample
Take a small preliminary sample and use the sample Take a
small preliminary sample and use the sample ss in place of in
place of
the sample size formula.in the sample size formula.
•• Method 2: Method 2: Assume Uniform PopulationAssume
Uniform Population
Estimate rough upper and lower limitsEstimate rough upper and
lower limits aa andand bb and setand setEstimate rough upper
and lower limits Estimate rough upper and lower limits aa and
and bb and set and set
--aa)/12])/12]½½. .
•• Method 3: Method 3: Assume Normal PopulationAssume
Normal Populatione od 3e od 3 ssu e o a opu a ossu e o a opu a
o
Estimate rough upper and lower limits Estimate rough upper and
--aa)/4.
)/4.
This assumes normality with most of the data with This assumes
normality with most of the data wit
•• Method 4: Method 4: Poisson ArrivalsPoisson Arrivals
8-42
C
hapter
8.9 Sample Size determination for a Proportion8.9 Sample Size
determination for a ProportionLO8LO8--99
•• To estimate a population proportion with a precision of To
estimate a population proportion with a precision of ±± E E
(allowable error), you would need a sample of size (allowable
error), you would need a sample of size
8
error is a number between 0 and 1, the allowable error EE is is
8-43
also between 0 and 1. also between 0 and 1.
C
hapter
8.9 Sample Size determination for a Proportion8.9 Sample Size
determination for a ProportionLO8LO8--99
8
.50
This conservative method ensures the desired precision
HoweverThis conservative method ensures the desired precision
HoweverThis conservative method ensures the desired
precision. However, This conservative method ensures the
desired precision. However,
the sample may end up being larger than necessary.the sample
may end up being larger than necessary.
•• Method 2Method 2: : Take a Preliminary SampleTake a
Preliminary Sample
T k ll li i l d th lT k ll li i l d th l i l fi l fTake a small
preliminary sample and use the sample Take a small preliminary
in the sample size formula.in the sample size formula.
•• Method 3Method 3:: Use a Prior Sample or Historical
DataUse a Prior Sample or Historical DataMethod 3Method 3: :
Use a Prior Sample or Historical DataUse a Prior Sample or
Historical Data
How often are such samples available? Unfortunately, How
might be
different enough to make it a questionable assumption. different
enough to make it a questionable assumption.
8-44
8.10 Confidence Interval for a Population Variance (8.10
Confidence Interval for a Population Variance
--1010
LO8LO8--10: 10: Construct a confidence interval for a variance
(optional).Construct a confidence interval for a variance
(optional).
If th l ti i l th th l iIf th l ti i l th th l i 22
ChiChi--Square DistributionSquare Distribution
•• If the population is normal, then the sample variance If the
population is normal, then the sample variance ss22
follows the follows the chichi--square distributionsquare
freedom freedom d.f.d.f. = = nn –– 1.1.eedoeedo dd
tail percentiles for the chi) tail percentiles for the chi--
square distribution can be found using Appendix Esquare
distribution can be found using Appendix E..
8-45
8.10 Confidence Interval for a Population Variance (8.10
Confidence Interval for a Population Variance
--1010
LO8LO8--10: 10: Construct a confidence interval for a variance
(optional).Construct a confidence interval for a variance
(optional).
U i th l iU i th l i 22 th fid i t l ith fid i t l i
Confidence IntervalConfidence Interval
•• Using the sample variance Using the sample variance ss22,
the confidence interval is, the confidence interval is
•• To obtain a confidence interval for the standard deviation To
obtain a confidence interval for the standard deviation
the square root of the interval bounds.
8-46
8.10 Confidence Interval for a Population Variance (8.10
Confidence Interval for a Population Variance
--1010
You can use Appendix E to find critical chi-square values.
8-47
8.10 Confidence Interval for a Population Variance (8.10
Confidence Interval for a Population Variance
--1010
Caution: Assumption of NormalityCaution: Assumption of
Normality
•• The methods described for confidence interval estimation of
the The methods described for confidence interval estimation of
the
variance and standard deviation depend on the population
having a variance and standard deviation depend on the
population having a
normal distributionnormal distributionnormal
distribution.normal distribution.
•• If the population does not have a normal distribution, then
the If the population does not have a normal distribution, then
the
confidence interval should not be considered accurateconfidence
interval should not be considered accurateconfidence interval
should not be considered accurate.confidence interval should
not be considered accurate.
8-48
CC
C
ha
Continuous Probability DistributionsContinuous Probability
Distributions
pter
Chapter ContentsChapter Contents
7
7 1 Describing a Continuous Distribution7 1 Describing a
Continuous Distribution7.1 Describing a Continuous
Distribution7.1 Describing a Continuous Distribution
7.2 Uniform Continuous Distribution 7.2 Uniform Continuous
Distribution
7 3 N l Di t ib ti7 3 N l Di t ib ti7.3 Normal Distribution7.3
Normal Distribution
7.4 Standard Normal Distribution7.4 Standard Normal
Distribution
7.5 Normal Approximations7.5 Normal Approximations
7.6 Exponential Distribution7.6 Exponential Distributionpp
7.7 Triangular Distribution (Optional)7.7 Triangular
Distribution (Optional)
7-1
C
ha
CC
pter
Continuous Probability DistributionsContinuous Probability
Distributions
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
7
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO7LO7 11LO7LO7--11: : Define a continuous random
variable.Define a continuous random variable.
LO7LO7--2: 2: Calculate uniform probabilities.Calculate
uniform probabilities.
LO7LO7--3: 3: Know the form and parameters of the normal
distribution.Know the form and parameters of the normal
distribution.
LO7LO7--4:4: Find the normal probability for given z or x
using tables or Excel.Find the normal probability for given z or
x using tables or Excel.LO7LO7 4:4: Find the normal
probability for given z or x using tables or Excel.Find the
normal probability for given z or x using tables or Excel.
LO7LO7--5:5: Solve for z or x for a given normal probability
using tables or Excel.Solve for z or x for a given normal
probability using tables or Excel.
7-2
C
ha
CC
pter
Continuous Probability DistributionsContinuous Probability
Distributions
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
7
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)
LO6LO6LO6:LO6: Use the normal approximation to a binomial
or a PoissonUse the normal approximation to a binomial or a
Poisson
distribution.distribution.
LO7:LO7: Find the exponential probability for a given xFind
the exponential probability for a given x..
LO8: LO8: Solve for x for given exponential probability.Solve
for x for given exponential probability.
LO9:LO9: Use the triangular distribution for “whatUse the
triangular distribution for “what--if” analysis (optional).if”
analysis (optional).
7-3
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
7
LO7LO7--1: 1: Define a continuous random variable.Define a
continuous random variable.
Di t V i blDi t V i bl h l fh l f XX h it b bilith it b bilit
PP((XX))
Events as IntervalsEvents as Intervals
•• Discrete VariableDiscrete Variable –– each value of each
value of XX has its own probability has its own probability
PP((XX).).
•• Continuous VariableContinuous Variable –– events are events
are intervalsintervals and probabilities are and probabilities are
areas under continuous curves A single point has no
probabilityareas under continuous curves A single point has no
probabilityareas under continuous curves. A single point has no
probability.areas under continuous curves. A single point has
no probability.
7-4
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
Continuous PDF’s:
PDF PDF –– Probability Density FunctionProbability Density
Function
7
Continuous PDF’s:
• Denoted f(x)
• Must be nonnegative• Must be nonnegative
• Total area under
curve = 1
• Mean, variance and
shape depend on
the PDF parametersthe PDF parameters
• Reveals the shape
of the distribution
7-5
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
CDF CDF –– Cumulative Distribution FunctionCumulative
Distribution Function
7
Continuous CDF’s:
• Denoted F(x)
• Shows P(X ≤ x), the
cumulative proportion
fof scores
• Useful for finding
probabilitiesprobabilities
7-6
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
Probabilities as AreasProbabilities as Areas
7
Continuous probability functions:Continuous probability
functions:
•• Unlike discrete Unlike discrete
distributions, the distributions, the
probability at any probability at any
single point = 0single point = 0single point = 0.single point = 0.
•• The entire area under The entire area under
any PDF, by definition, any PDF, by definition, y , y ,y , y ,
is set to 1.is set to 1.
•• Mean is the balanceMean is the balance
point of the distribution.point of the distribution.
7-7
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
Expected Value and VarianceExpected Value and Variance
7
The mean and variance of a continuous random variable are
analogous to
E(X) and Var(X ) for a discrete random variable, Here the
integral sign
replaces the summation sign. Calculus is required to compute
the integrals. p g q p g
7-8
C
ha
7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti
pter
7.2 Uniform Continuous Distribution7.2 Uniform Continuous
DistributionLO7LO7--22
7
LO7LO7--2: 2: Calculate uniform probabilities.Calculate
uniform probabilities.
Characteristics of the Uniform Characteristics of the Uniform
DistributionDistribution
If If XX is a random variable that is is a random variable that is
uniformly distributed between uniformly distributed between
aa and and bb, its PDF has , its PDF has
constant height.constant height.
• Denoted U(a, b)
• Area =
base x height =base x height
(b-a) x 1/(b-a) = 1
7-9
C
ha
7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti
pter
7.2 Uniform Continuous Distribution7.2 Uniform Continuous
DistributionLO7LO7--22
Characteristics of the Uniform DistributionCharacteristics of
the Uniform Distribution
7
7-10
C
ha
7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti
pter
7.2 Uniform Continuous Distribution7.2 Uniform Continuous
DistributionLO7LO7--22
Example: Anesthesia EffectivenessExample: Anesthesia
Effectiveness
•• An oral surgeon injects a painkiller prior to extracting a tooth
Given theAn oral surgeon injects a painkiller prior to extracting
a tooth Given the
7
An oral surgeon injects a painkiller prior to extracting a tooth.
Given the An oral surgeon injects a painkiller prior to
extracting a tooth. Given the
varying characteristics of patients, the dentist views the time for
varying characteristics of patients, the dentist views the time for
anesthesia effectiveness as a uniform random variable that takes
anesthesia effectiveness as a uniform random variable that takes
between 15 minutes and 30 minutesbetween 15 minutes and 30
minutesbetween 15 minutes and 30 minutes.between 15 minutes
and 30 minutes.
•• XX is is UU(15, 30)(15, 30)
•• aa = 15,= 15, bb = 30, find the mean and standard deviation.=
30, find the mean and standard deviation.aa 15, 15, bb 30,
find the mean and standard deviation. 30, find the mean and
standard deviation.
•• Find the probability that the effectiveness anesthetic takes
between Find the probability that the effectiveness anesthetic
takes between
20 and 25 minutes.20 and 25 minutes.
7-11
20 and 25 minutes.20 and 25 minutes.
C
ha
7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti
pter
7.2 Uniform Continuous Distribution7.2 Uniform Continuous
DistributionLO7LO7--22
7
Example: Anesthesia EffectivenessExample: Anesthesia
Effectiveness
PP(20 < (20 < XX < 25) = (25 < 25) = (25 –– 20)/(30 20)/(30 ––
15) = 5/15 = 0.3333 = 33.33% 15) = 5/15 = 0.3333 = 33.33%
7-12
C
ha
7 3 N l Di t ib ti7 3 N l Di t ib ti
pter
7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33
7
LO7LO7--3: 3: Know the form and parameters of the normal
distribution.Know the form and parameters of the normal
distribution.
Characteristics of the Normal DistributionCharacteristics of the
Normal Distribution
• Normal or Gaussian (or bell shaped) distribution was named
for German
mathematician Karl Gauss (1777 – 1855).
• Domain is –
• Almost all (99.7%) of the area under the normal curve is
included in the ( )
range µ –
• Symmetric and unimodal about the mean.
7-13
C
ha
7 3 N l Di t ib ti7 3 N l Di t ib ti
pter
7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33
Characteristics of the Normal DistributionCharacteristics of the
Normal Distribution
7
7-14
C
ha
7 3 N l Di t ib ti7 3 N l Di t ib ti
pter
7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33
7
Characteristics of the Normal DistributionCharacteristics of the
Normal Distribution
•• Normal PDF Normal PDF ff((xx) reaches a maximum at )
reaches a maximum at µµ and has points of inflection at and has
points of inflection at
Bell-shaped curve
NOTE:NOTE: All normal All normal
distributionsdistributionsdistributions distributions
have the same have the same
shape but differshape but differshape but differ shape but differ
in the axis scales.in the axis scales.
7-15
C
ha
7 3 N l Di t ib ti7 3 N l Di t ib ti
pter
7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33
7
Characteristics of the Normal DistributionCharacteristics of the
Normal Distribution
•• Normal CDF Normal CDF
7-16
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--33
Characteristics of the Standard Normal
DistributionCharacteristics of the Standard Normal Distribution
7
rent normal
distribution, we transform a normal y µ , ,
random variable to a standard normal distribution with µ = 0
7-17
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Characteristics of the Standard NormalCharacteristics of the
Standard Normal
St d d l PDF f( ) h i t 0 d h
7
• Standard normal PDF f(x) reaches a maximum at z = 0 and has
points of inflection at +1.
•• Shape is unaffected by Shape is unaffected by
the transformationthe transformationthe transformation. the
transformation.
It is still a bellIt is still a bell--shaped shaped
curve.curve.
Figure 7 11
7-18
Figure 7.11
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Characteristics of the Standard NormalCharacteristics of the
Standard Normal
•• Standard normal CDFStandard normal CDF
7
A common scale•• Standard normal CDFStandard normal CDF •
A common scale
from -3 to +3 is used.
• Entire area under the
curve is unity.
• The probability of an
event P(z < Z < z )event P(z1 < Z < z2)
is a definite integral
of f(z).
• However, standard
normal tables or
Excel functions can
be used to find the
desired probabilities.
7-19
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Normal Areas from Appendix CNormal Areas from Appendix C-
-11
• Appendix C-1 allows you to find the area under the curve
7
• Appendix C-1 allows you to find the area under the curve
from 0 to z.
• For example, find P(0 < Z < 1.96):p , ( )
7-20
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Normal Areas from Appendix CNormal Areas from Appendix C-
-11
•• Now findNow find PP((--1 96 <1 96 < ZZ < 1 96)< 1 96)
7
•• Now find Now find PP((--1.96 < 1.96 < ZZ < 1.96).< 1.96).
•• Due to symmetry, Due to symmetry, PP((--1.96 < 1.96 < ZZ)
is the same as ) is the same as PP((ZZ < 1.96).< 1.96).
• So, P(-1.96 < Z < 1.96) = .4750 + .4750 = .9500 or 95% of the
area under the curve.
7-21
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Basis for the Empirical RuleBasis for the Empirical Rule
7
• Approximately 68% of the area under the curve is between +
• Approximately 95% of the area under the curve is between +
• Approximately 99.7% of the area under the curve is between +
7-22
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--44
N l A f A di CN l A f A di C 22
7
LO7LO7--4: 4: Find the normal probability for given z or x
using tables or Excel.Find the normal probability for given z or
x using tables or Excel.
Normal Areas from Appendix CNormal Areas from Appendix C-
-22
•• Appendix CAppendix C--2 allows you to find the area under
the curve from the left of 2 allows you to find the area under the
curve from the left of
zz (similar to Excel)(similar to Excel)z z (similar to
Excel).(similar to Excel).
•• For example, For example,
PP((ZZ < < --1.96)1.96)PP((ZZ < 1.96< 1.96) PP((--1.96 < 1.96
< ZZ < 1.96)< 1.96)
7-23
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--44
Normal Areas from Appendices CNormal Areas from
Appendices C--1 or C1 or C--22
•• Appendices CAppendices C--1 and C1 and C--2 yield
identical results2 yield identical results
7
•• Appendices CAppendices C--1 and C1 and C--2 yield
identical results.2 yield identical results.
•• Use whichever table is easiest.Use whichever table is easiest.
Finding Finding zz for a Given Areafor a Given Area
• Appendices C-1 and C-2 can be used to find the
z-value corresponding to a given probability.
For e ample hat al e defines the top 1% of a normal• For
example, what z-value defines the top 1% of a normal
distribution?
• This implies that 49% of the area lies between 0 and z
whichThis implies that 49% of the area lies between 0 and z
which
gives z = 2.33 by looking for an area of 0.4900 in Appendix C-
1.
7-24
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--44
Finding Areas by using Standardized VariablesFinding Areas by
using Standardized Variables
7
•• Suppose John took an economics exam and scored 86 points
The classSuppose John took an economics exam and scored 86
points The classSuppose John took an economics exam and
scored 86 points. The class Suppose John took an economics
exam and scored 86 points. The class
mean was 75 with a standard deviation of 7. What percentile is
John in? mean was 75 with a standard deviation of 7. What
percentile is John in?
That is, what is That is, what is PP((XX < 86) where X
represents the exam scores?< 86) where X represents the exam
scores?
•• So John’s score is 1.57 standard deviations about the mean.
So John’s score is 1.57 standard deviations about the mean.
•• PP((XX < 86) = < 86) = PP((ZZ < 1.57) = .9418 (from
Appendix C< 1.57) = .9418 (from Appendix C--2)2)
•• So John is approximately in the 94So John is approximately
in the 94thth percentilepercentile•• So, John is approximately in
the 94So, John is approximately in the 94thth
percentilepercentile..
7-25
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--44
•• Finding Areas by using Standardized VariablesFinding Areas
by using Standardized Variables
7
NOTE: You can use Excel, Minitab, TI83/84 etc. to compute
these
probabilities directly.
7-26
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--55
7
LO7LO7--5: 5: Solve for z or x for a normal probability using
tables or Excel.Solve for z or x for a normal probability using
tables or Excel.
•• Inverse NormalInverse Normal
• How can we find the various normal percentiles (5th 10th 25th
75th• How can we find the various normal percentiles (5th,
10th, 25th, 75th,
90th, 95th, etc.) known as the inverse normal? That is, how can
we
find X for a given area? We simply turn the standardizing
transformation around:
=
7-27
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--55
7
•• Inverse NormalInverse Normal
• For example, suppose that John’s economics professor has
decided
that any student who scores below the 10th percentile must
retake the
exam.
• The exam scores are normal with μ = 75 and σ = 7.
• What is the score that would require a student to retake the
exam?
• We need to find the value of x that satisfies P(X < x) = 10We
need to find the value of x that satisfies P(X < x) .10.
• The z-score for with the 10th percentile is z = −1.28.
7-28
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--55
Inverse NormalInverse Normal
7
•• Inverse NormalInverse Normal
• The steps to solve the problem are:The steps to solve the
problem are:
• Use Appendix C or Excel to find z = −1.28 to satisfy P(Z <
−1.28) = .10.
• Substitute the given information into z = (x μ)/σ to get•
Substitute the given information into z = (x − μ)/σ to get
−1.28 = (x − 75)/7
• Solve for x to get x = 75 − (1.28)(7) = 66.03 (or 66 after
rounding)
S• Students who score below 66 points on the economics exam
will be
required to retake the exam.
7-29
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
7.4 Standard Normal Distribution7.4 Standard Normal
DistributionLO7LO7--55
•• Inverse NormalInverse Normal
7
7-30
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
7
LO7LO7--6: 6: Use the normal approximation to a binomial or a
Poisson.Use the normal approximation to a binomial or a
Poisson.
Normal Approximation to the BinomialNormal Approximation
to the Binomial
Bi i l b biliti diffi lt t l l t hBi i l b biliti diffi lt t l l t h i li l••
Binomial probabilities are difficult to calculate when Binomial
probabilities are difficult to calculate when nn is large.is large.
•• Use a normal approximation to the binomial distribution.Use
a normal approximation to the binomial distribution.
•• AsAs nn becomes large, the binomial bars become smaller
and continuity isbecomes large, the binomial bars become
smaller and continuity isAs As nn becomes large, the binomial
bars become smaller and continuity is becomes large, the
binomial bars become smaller and continuity is
approached.approached.
7-31
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
Normal Approximation to the BinomialNormal Approximation
to the Binomial
7
-
appropriate to use
the normal approximation to the binomial distribution.
• In this case the mean and standard deviation for the binomial
distribution• In this case, the mean and standard deviation for
the binomial distribution
Example Coin FlipsExample Coin Flips
If t fli i 32 ti d 50 th• If we were to flip a coin n = 32 times and
requirements for a normal approximation to the binomial
distribution
met?
7-32
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
Example Coin FlipsExample Coin Flips
7
n(1- - .50) = 16
• So a normal approximation can be usedSo, a normal
approximation can be used.
• When translating a discrete scale into a continuous scale,
care must be taken about individual points.
• For example, find the probability of more than 17 heads in
32 flips of a fair coin.
• However, “more than 17” actually falls between 17 and 18
on a discrete scale.
7-33
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
Example Coin FlipsExample Coin Flips
•• Since the cutoff point for “more than 17” is halfway between
17 and 18 weSince the cutoff point for “more than 17” is
halfway between 17 and 18 we
7
Since the cutoff point for more than 17 is halfway between 17
and 18, we Since the cutoff point for more than 17 is halfway
between 17 and 18, we
add 0.5 to the lower limit and find add 0.5 to the lower limit
and find PP((XX > 17.5).> 17.5).
•• This addition to This addition to XX is called the is called the
Continuity CorrectionContinuity Correction..
•• At this point, the problem can be completed as any normal
distribution At this point, the problem can be completed as any
normal distribution
problem.problem.
7-34
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
7
Example Coin FlipsExample Coin Flips
P(X > 17) P(X ≥ 18) P(X ≥ 17 5)P(X >
17.5)
= P(Z > 0.53) = 0.2981
7-35
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
Normal Approximation to the PoissonNormal Approximation to
the Poisson
• The normal approximation to the Poisson distribution works
best
7
• The normal approximation to the Poisson distribution works
best
B).
deviation µ q
for the Poisson distribution.
Example Utility BillsExample Utility Billsp yp y
• On Wednesday between 10A.M. and noon customer billing
inquiries arrive at a mean rate of 42 inquiries per hour at
Consumers Energy. What is the probability of receiving more
than 50 calls in an hour?
7-36
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
Example Utility BillsExample Utility Bills
7
•• To find To find PP((XX > 50) calls, use the continuity> 50)
calls, use the continuity--corrected cutoff point halfway
corrected cutoff point halfway
between 50 and 51 (i.e., between 50 and 51 (i.e., XX = 50.5).=
50.5).
•• At this point the problem can be completed as any normal
distributionAt this point the problem can be completed as any
normal distributionAt this point, the problem can be completed
as any normal distribution At this point, the problem can be
completed as any normal distribution
problem.problem.
7-37
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--77
7
LO7LO7--7: 7: Find the exponential probability for a given
xFind the exponential probability for a given x..
Characteristics of the Exponential DistributionCharacteristics of
the Exponential Distribution
If t it f ti f ll P i di t ib ti th ti til thIf t it f ti f ll P i di t ib ti th
ti til th•• If events per unit of time follow a Poisson
distribution, the time until the If events per unit of time follow
a Poisson distribution, the time until the
next event follows the next event follows the Exponential
distribution.Exponential distribution.
•• The time until the next event is a continuous variable.The
time until the next event is a continuous variable.
NOTE HereNOTE HereNOTE: Here NOTE: Here
we will findwe will find
probabilitiesprobabilities
> x or ≤ x.> x or ≤ x.
7-38
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--77
Characteristics of the Exponential DistributionCharacteristics of
the Exponential Distribution
7
Probability of waiting more than xProbability of waiting less
than or
equal to x
7-39
equa to
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--77
Example Customer Waiting TimeExample Customer Waiting
Time
7
• Between 2P.M. and 4P.M. on Wednesday, patient insurance
inquiries arrive at Blue Choice insurance at a mean rate of 2.2
calls
iper minute.
• What is the probability of waiting more than 30 seconds (i.e.,
0.50
minutes) for the next call?minutes) for the next call?
• P(X > 0 50) = e– –(2.2)(0.5) = 3329P(X > 0.50) = e = e
( )( ) = .3329
or 33.29% chance of waiting more than 30 seconds for the next
call.
7-40
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--77
Example Customer Waiting TimeExample Customer Waiting
Time
7
P(X > 0.50) P(X ≤ 0.50)
7-41
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--88
7
LO7LO7--8: 8: Solve for Solve for x for given x for given
exponential probability.exponential probability.
Inverse ExponentialInverse Exponential
If th i l t i 2 2 ll i t t th 90If th i l t i 2 2 ll i t t th 90thth•• If the
mean arrival rate is 2.2 calls per minute, we want the 90If the
mean arrival rate is 2.2 calls per minute, we want the 90thth
percentile for waiting time (the top 10% of waiting
time).percentile for waiting time (the top 10% of waiting time).
•• Find theFind the xx--valuevalueFind the Find the xx--value
value
that defines the that defines the
upper 10%.upper 10%.
7-42
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--88
Inverse ExponentialInverse Exponential
7
7-43
C
hapter
7.6 Exponential Distribution7.6 Exponential
DistributionLO7LO7--88
7
Mean Time Between EventsMean Time Between Events
7-44
C
hapter
7.7 Triangular Distribution7.7 Triangular
DistributionLO7LO7--99
Ch t i ti f th T i l Di t ib tiCh t i ti f th T i l Di t ib ti
7
LO7LO7--9: 9: Use the triangular distribution for “whatUse the
triangular distribution for “what--if” analysis (optional).if”
analysis (optional).
Characteristics of the Triangular DistributionCharacteristics of
the Triangular Distribution
7-45
C
hapter
7.7 Triangular Distribution7.7 Triangular
DistributionLO7LO7--99
Characteristics of the Triangular DistributionCharacteristics of
the Triangular Distribution
7
The triang lar distrib tion is a a of thinking abo t ariation that•
The triangular distribution is a way of thinking about variation
that
corresponds rather well to what-if analysis in business.
• It is not surprising that business analysts are attracted to the
triangular
model.
• Its finite range and simple form are more understandable than
a normal
distribution.
7-46
C
hapter
7.7 Triangular Distribution7.7 Triangular
DistributionLO7LO7--99
Characteristics of the Triangular DistributionCharacteristics of
the Triangular Distribution
7
• It is more versatile than a normal, because it can be skewed in
either
direction.
Y t it h f th i ti f l h di ti t d• Yet it has some of the nice
properties of a normal, such as a distinct mode.
• The triangular model is especially handy for what-if analysis
when the
business case depends on predicting a stochastic variable (e.g.,
the price
of a raw material, an interest rate, a sales volume).
• If the analyst can anticipate the range (a to c) and most likely
value (b), it will
be possible to calculate probabilities of various outcomes. p p
• Many times, such distributions will be skewed, so a normal
wouldn’t
be much help.
7-47
屏幕快照 2014-03-17 下午1.37.25.png
__MACOSX/._屏幕快照 2014-03-17 下午1.37.25.png
屏幕快照 2014-03-17 下午1.37.28.png
__MACOSX/._屏幕快照 2014-03-17 下午1.37.28.png
屏幕快照 2014-03-17 下午1.38.04.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.04.png
屏幕快照 2014-03-17 下午1.38.09.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.09.png
屏幕快照 2014-03-17 下午1.38.13.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.13.png
屏幕快照 2014-03-17 下午1.38.17.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.17.png
屏幕快照 2014-03-17 下午1.38.21.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.21.png
屏幕快照 2014-03-17 下午1.38.28.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.28.png
屏幕快照 2014-03-17 下午1.38.32.png
__MACOSX/._屏幕快照 2014-03-17 下午1.38.32.png

More Related Content

PPT
Regression and Co-Relation
PPTX
Simple Regression Analysis ch12.pptx
PPTX
Chapter 12
PPT
Simple linear regression - regression analysis.ppt
PPTX
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
PPT
Newbold_chap12.ppt
PPT
Intro to corhklloytdeb koptrcb k & reg.ppt
PPT
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
Regression and Co-Relation
Simple Regression Analysis ch12.pptx
Chapter 12
Simple linear regression - regression analysis.ppt
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
Newbold_chap12.ppt
Intro to corhklloytdeb koptrcb k & reg.ppt
correlation and r3433333333333333333333333333333333333333333333333egratio111n...

Similar to SSChaSimple RegressionSimple Regressionpter C.docx (20)

PPT
simple linear regression and correlation statistics course
PPT
Chap013.ppt
PDF
Chapter 13 (1).pdf
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
PPT
2-20-04.ppt
PPTX
Correlation and Regression ppt
PPT
Ch8 Regression Revby Rao
PPTX
Correlation and regression
PPT
Correlation and regression
PPTX
ch12 (1).pptx Himalaya shampoo and conditioner for the baby is the message fo...
PPT
correlation and regression
PPTX
6 the six uContinuous data analysis.pptx
PPTX
Correlation and regression
DOCX
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
PPT
DOCX
Copyright© Dorling Kinde.docx
DOCX
Copyright© Dorling Kinde.docx
PPTX
Lesson04
PPT
Statistics08_Cut_Regression.jdnkdjvbjddj
PPTX
Regression and correlation in statistics
simple linear regression and correlation statistics course
Chap013.ppt
Chapter 13 (1).pdf
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
2-20-04.ppt
Correlation and Regression ppt
Ch8 Regression Revby Rao
Correlation and regression
Correlation and regression
ch12 (1).pptx Himalaya shampoo and conditioner for the baby is the message fo...
correlation and regression
6 the six uContinuous data analysis.pptx
Correlation and regression
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
Copyright© Dorling Kinde.docx
Copyright© Dorling Kinde.docx
Lesson04
Statistics08_Cut_Regression.jdnkdjvbjddj
Regression and correlation in statistics
Ad

More from rafbolet0 (20)

DOCX
Summarize the key ideas of each of these texts and explain how they .docx
DOCX
Submit, individually, different kinds of data breaches, the threats .docx
DOCX
Submit your personal crimes analysis using Microsoft® PowerPoi.docx
DOCX
Submit two pages (double spaced, 12 point font) describing a musical.docx
DOCX
Submit the rough draft of your geology project. Included in your rou.docx
DOCX
Submit your paper of Sections III and IV of the final project. Spe.docx
DOCX
Submit the finished product for your Geology Project. Please include.docx
DOCX
Submit the Background Information portion of the final project, desc.docx
DOCX
Submit Files - Assignment 1 Role of Manager and Impact of Organizati.docx
DOCX
SRF Journal EntriesreferenceAccount TitlesDebitsCredits3-CType jou.docx
DOCX
srcCommissionCalculation.javasrcCommissionCalculation.javaimpo.docx
DOCX
SPSS Assignment Data.savWeek 6, Using Marketing Channel.docx
DOCX
SQLServerFilesCars.mdf__MACOSXSQLServerFiles._Cars.mdf.docx
DOCX
Square, Inc. is a financial services, merchant services aggregat.docx
DOCX
SQL SQL 2) Add 25 CUSTOMERSs so that you now have 50 total..docx
DOCX
SPSS InputStephanie Crookston, Dominique.docx
DOCX
Spring  2015  –  MAT  137  –Luedeker       Na.docx
DOCX
Springdale Shopping SurveyThe major shopping areas in the com.docx
DOCX
Springfield assignment InstructionFrom the given information, yo.docx
DOCX
SPRING CLEAN PRODUCTSMARKET RESEARCH 1Abou.docx
Summarize the key ideas of each of these texts and explain how they .docx
Submit, individually, different kinds of data breaches, the threats .docx
Submit your personal crimes analysis using Microsoft® PowerPoi.docx
Submit two pages (double spaced, 12 point font) describing a musical.docx
Submit the rough draft of your geology project. Included in your rou.docx
Submit your paper of Sections III and IV of the final project. Spe.docx
Submit the finished product for your Geology Project. Please include.docx
Submit the Background Information portion of the final project, desc.docx
Submit Files - Assignment 1 Role of Manager and Impact of Organizati.docx
SRF Journal EntriesreferenceAccount TitlesDebitsCredits3-CType jou.docx
srcCommissionCalculation.javasrcCommissionCalculation.javaimpo.docx
SPSS Assignment Data.savWeek 6, Using Marketing Channel.docx
SQLServerFilesCars.mdf__MACOSXSQLServerFiles._Cars.mdf.docx
Square, Inc. is a financial services, merchant services aggregat.docx
SQL SQL 2) Add 25 CUSTOMERSs so that you now have 50 total..docx
SPSS InputStephanie Crookston, Dominique.docx
Spring  2015  –  MAT  137  –Luedeker       Na.docx
Springdale Shopping SurveyThe major shopping areas in the com.docx
Springfield assignment InstructionFrom the given information, yo.docx
SPRING CLEAN PRODUCTSMARKET RESEARCH 1Abou.docx
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Lesson notes of climatology university.
PDF
RMMM.pdf make it easy to upload and study
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
01-Introduction-to-Information-Management.pdf
PPTX
master seminar digital applications in india
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Final Presentation General Medicine 03-08-2024.pptx
Classroom Observation Tools for Teachers
Abdominal Access Techniques with Prof. Dr. R K Mishra
O5-L3 Freight Transport Ops (International) V1.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Lesson notes of climatology university.
RMMM.pdf make it easy to upload and study
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
01-Introduction-to-Information-Management.pdf
master seminar digital applications in india
Anesthesia in Laparoscopic Surgery in India
Final Presentation General Medicine 03-08-2024.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
VCE English Exam - Section C Student Revision Booklet
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
STATICS OF THE RIGID BODIES Hibbelers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

SSChaSimple RegressionSimple Regressionpter C.docx

  • 1. SS C ha Simple RegressionSimple Regression pter Chapter ContentsChapter Contents 12 12.1 Visual Displays and Correlation Analysis12.1 Visual Displays and Correlation Analysisp y yp y y 12.2 Simple Regression12.2 Simple Regression 12 3 Regression Terminology12 3 Regression Terminology12.3 Regression Terminology12.3 Regression Terminology 12.4 Ordinary Least Squares Formulas12.4 Ordinary Least Squares Formulas 12 T f Si ifi12 T f Si ifi12.5 Tests for Significance12.5 Tests for Significance 12.6 Analysis of Variance: Overall Fit12.6 Analysis of Variance: Overall Fit 12.7 Confidence and Prediction Intervals for 12.7 Confidence and Prediction Intervals for YY 12-1 SS C
  • 2. ha Simple RegressionSimple Regression pter Chapter ContentsChapter Contents 12 12 8 Residual Tests12 8 Residual Tests12.8 Residual Tests12.8 Residual Tests 12.9 Unusual Observations12.9 Unusual Observations 12 10 Oth R i P bl12 10 Oth R i P bl12.10 Other Regression Problems12.10 Other Regression Problems 12-2 C ha SS pter 1 Simple RegressionSimple Regression Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 12 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s)
  • 3. LO12LO12--1: 1: Calculate and test a correlation Calculate and test a correlation coefficient coefficient for for significancesignificance.. LO12LO12--2: 2: Interpret Interpret the slope and intercept of a regression equation.the slope and intercept of a regression equation. LO12LO12--3: 3: Make Make a prediction for a given a prediction for a given x value using a x value using a regressionregression equationequation..qq LO12LO12--4: 4: Fit a simple regression on an Excel scatter plot.Fit a simple regression on an Excel scatter plot. LO12LO12--5:5: Calculate and interpretCalculate and interpret confidenceconfidence intervals forintervals for regressionregressionLO12LO12 5: 5: Calculate and interpret Calculate and interpret confidence confidence intervals for intervals for regressionregression coefficientscoefficients.. LO12LO12 6:6: Test hypotheses about the slope and intercept by usingTest hypotheses about the slope and intercept by using t testst tests 12-3 LO12LO12--6: 6: Test hypotheses about the slope and intercept by using Test hypotheses about the slope and intercept by using t tests.t tests. C ha
  • 4. ff pter Analysis of VarianceAnalysis of Variance Ch t L i Obj ti (LO’ )Ch t L i Obj ti (LO’ ) 12 Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) LO12LO12--7:7: Perform regression with Excel or other software.Perform regression with Excel or other software. LO12LO12--8:8: Interpret the standard errorInterpret the standard error RR22 ANOVA table and F testANOVA table and F testLO12LO12 8: 8: Interpret the standard error, Interpret the standard error, RR , ANOVA table, and F test., ANOVA table, and F test. LO12LO12--9:9: Distinguish between confidence and prediction intervals.Distinguish between confidence and prediction intervals. LO12LO12 1010 T t id l f i l ti f i tiT t id l f i l ti f i tiLO12LO12--10:10: Test residuals for violations of regression assumptions.Test residuals for violations of regression assumptions. LO12LO12--11:11: Identify unusual residuals and highIdentify unusual residuals and high--leverage observations.leverage observations. 12-4 12 1 Visual12 1 Visual Displays andDisplays and
  • 5. C ha 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis pter 1 Visual DisplaysVisual Displays 12 •• Begin the analysis of Begin the analysis of bivariate databivariate data (i.e., two variables) with a (i.e., two variables) with a scatter plotscatter plot.. A tt l tA tt l t•• A scatter plot A scatter plot -- displays each observed data pair (displays each observed data pair (xxii, , yyii) as a dot on an ) as a dot on an X/YX/Y grid.grid. -- indicates visually the strength of the relationship between theindicates visually the strength of the relationship between theindicates visually the strength of the relationship between the indicates visually the strength of the relationship between the two variables.two variables. Sample Scatter Plot 12-5 C ha
  • 6. 12 1 Visual12 1 Visual Displays andDisplays and pter 1 LO12LO12--11 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis 12 LO12LO12--1: 1: Calculate and test a correlation coefficient for significance.Calculate and test a correlation coefficient for significance. Correlation CoefficientCorrelation Coefficient •• The sample correlation coefficient (r) measures the•• The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y. -1 ≤ r ≤ +1 r = 0 indicates no linear relationship 12-6 C ha 12 1 Visual12 1 Visual Displays andDisplays and pter 1 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation AnalysisLO12LO12--11 Scatter Plots Showing Various Correlation ValuesScatter Plots Showing Various Correlation Values 12
  • 7. Strong Positive Correlation Weak Positive Correlation Weak Negative Correlation 12-7Strong Negative Correlation No Correlation Nonlinear Relation C ha 12 1 Visual12 1 Visual Displays andDisplays and pter 1 LO12LO12--11 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis correlation = 0)= 0 (population correlation = 0) 12 •• Step 1:Step 1: State the HypothesesState the Hypotheses Determine whether you are using a one or twoDetermine whether you are using a one or two--tailed test and the tailed test and the •• Step 2:Step 2: Specify the Decision RuleSpecify the Decision Rule For degrees of freedom For degrees of freedom df df = = nn --2,
  • 8. Appendi DAppendi DAppendix D.Appendix D. •• Note: r is an estimate of the population 12-8 C ha 12 1 Visual12 1 Visual Displays andDisplays and pter 1 LO12LO12--11 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis Steps in Testi correlation = 0)= 0 (population correlation = 0) 12 •• Step 3:Step 3: Calculate the Test StatisticCalculate the Test Statistic •• Step 4: Step 4: Make the DecisionMake the Decision If the sample correlation coefficientIf the sample correlation coefficient rr exceeds the critical valueexceeds the critical value rr ,,If the sample correlation coefficient If the sample correlation coefficient rr exceeds the critical value exceeds the then reject then reject HH00.. If using the If using the tt statistic method, reject statistic --
  • 9. 12-9 C ha 12 1 Visual12 1 Visual Displays andDisplays and pter 1 LO12LO12--11 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis Critical Value for Correlation Coefficient (Critical Value for Correlation Coefficient (Tests for Significance)Tests for Significance) 12 •• Equivalently, you can calculate the critical value for the correlation Equivalently, you can calculate the critical value for the correlation coefficient usingcoefficient using •• This method gives a benchmark for the correlation coefficient.This method gives a benchmark for the correlation coefficient.gg •• However, there is no However, there is no pp--value and is inflexible if you change your value and is inflexible if you change your • MegaStat uses this method, giving two-tail critical values for
  • 10. 12-10 C ha 12 1 Visual12 1 Visual Displays andDisplays and pter 1 LO12LO12--11 12.1 Visual 12.1 Visual Displays and Displays and Correlation AnalysisCorrelation Analysis 12 12-11 C ha 12 2 Si l R i12 2 Si l R i pter 1 12.2 Simple Regression12.2 Simple Regression What is Simple Regression?What is Simple Regression? 12 • Simple Regression analyzes the relationship between two variables. It ifi d d t ( ) i bl d• It specifies one dependent (response) variable and one independent (predictor) variable.
  • 11. • This hypothesized relationship here will be linear• This hypothesized relationship here will be linear. 12-12 C ha 12 2 Si l R i12 2 Si l R i pter 1 12.2 Simple Regression12.2 Simple RegressionLO12LO12--22 LO12LO12 2:2: Interpret the slope and intercept of a regression equationInterpret the slope and intercept of a regression equation Interpreting an Estimated Regression Equation: ExamplesInterpreting an Estimated Regression Equation: Examples 12LO12LO12--2: 2: Interpret the slope and intercept of a regression equation.Interpret the slope and intercept of a regression equation. 12-13 C ha 12 2 Si l R i12 2 Si l R i pter 1 12.2 Simple Regression12.2 Simple RegressionLO12LO12--33
  • 12. LO12LO12 33 Prediction Using Regression: ExamplesPrediction Using Regression: Examples 12LO12LO12--3: 3: Make a prediction for a given Make a prediction for a given x value using a x value using a regression equation.regression equation. g g pg g p 12-14 C ha 12 2 Si l R i12 2 Si l R i pter 1 12.2 Simple Regression12.2 Simple Regression NOTES:NOTES: 12 12-15 C ha 12 3 R i T i l12 3 R i T i l M d l d P tM d l d P t
  • 13. pter 1 12.3 Regression Terminology12.3 Regression Terminology Model and ParametersModel and Parameters 12 • The assumed model for a linear relationship is • The relationship holds for all pairs (xi , yi ). independently normally distributed with mean of 0 and standard deviation • The unknown parameters are: 12-16 C ha 12 3 R i T i l12 3 R i T i l pter 1 12.3 Regression Terminology12.3 Regression Terminology Model and ParametersModel and Parameters
  • 14. 12 •• The The fitted model fitted model oror regression model regression model is used to predict the is used to predict the expectedexpected value of value of YY for a given value of for a given value of XX isis •• TheThe fitted coefficientsfitted coefficients areareThe The fitted coefficientsfitted coefficients areare b0 the estimated intercept b1 the estimated slope 12-17 C ha 12 3 R i T i l12 3 R i T i l pter 1 LO12LO12--44 12.3 Regression Terminology12.3 Regression Terminology 12 LO12LO12--4: 4: Fit a simple regression on an Excel scatter plot.Fit a simple regression on an Excel scatter plot. A more precise method is to let Excel calculate the estimates. We enter observations on the independent variable x1, x2, . . ., xn and the dependent variable y1, y2, . . ., yn into separate columns and let Excel fi t theseparate columns, and let Excel fi t the
  • 15. regression equation, as illustrated in Figure 12.6. Excel will choose the regression coefficients so as to produce a good fi t 12-18 C ha 12 3 R i T i l12 3 R i T i l pter 1 LO12LO12--44 12.3 Regression Terminology12.3 Regression Terminology 12 Slope and Intercept InterpretationsSlope and Intercept Interpretations • Figure 12 6 (previous slide) shows a sample of miles per gallon and• Figure 12.6 (previous slide) shows a sample of miles per gallon and horsepower for 15 engines. The Excel graph and its fitted regression equation are also shown. • Slope Interpretation: The slope of -0.0785 says that for each additional unit of engine horsepower, the miles per gallon decreases by 0.0785 mile. This estimated slope is a statistic because a different sample might yield aThis estimated slope is a statistic because a different sample might yield a
  • 16. different estimate of the slope. • Intercept Interpretation: The intercept value of 49.216 suggests that when the engine has no horsepower , the fuel efficiency would be quite high.the engine has no horsepower , the fuel efficiency would be quite high. However, the intercept has little meaning in this case, not only because zero horsepower makes no logical sense, but also because extrapolating to x = 0 is beyond the range of the observed data.y g 12-19 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas Slope and InterceptSlope and Intercept 12 •• The The ordinary least squaresordinary least squares method (method (OLSOLS) estimates the slope ) estimates the slope d i t t f th i li th t th f id l id i t t f th i li th t th f id l iand intercept of the regression line so that the sum of residuals is and intercept of the regression line so that the sum of residuals
  • 17. is minimized.minimized. •• The sum of the residuals = 0The sum of the residuals = 0•• The sum of the residuals = 0.The sum of the residuals = 0. •• The sum of the squared residuals is The sum of the squared residuals is SSE.SSE. 12-20 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas ThTh OLSOLS ti t f th l iti t f th l i Slope and InterceptSlope and Intercept 12 •• The The OLSOLS estimator for the slope is:estimator for the slope is: oror ThTh OLSOLS ti t f th i t t iti t f th i t t i•• The The OLSOLS estimator for the intercept is:estimator for the intercept is:
  • 18. 12-21 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas Slope and InterceptSlope and Intercept 12 12-22 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas Assessing FitAssessing Fit 12 •• We want to explain the total variation in We want to explain
  • 19. the total variation in YY around its mean (around its mean (SSTSST for for Total Sums of SquaresTotal Sums of Squares).). •• The regression sum of squares (The regression sum of squares (SSRSSR) is the ) is the explained variation explained variation in in Y.Y. 12-23 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas Th f (Th f (SSESSE) i th) i th l i d i til i d i ti ii YY Assessing FitAssessing Fit 12 •• The error sum of squares (The error sum of squares (SSESSE) is the ) is the unexplained variationunexplained variation in in Y.Y. •• If the fit is good, If the fit is good, SSESSE will be relatively small compared to will be relatively small compared to SSTSST.. A perfect fit is indicated by anA perfect fit is indicated by an
  • 20. SSESSE = 0= 0•• A perfect fit is indicated by an A perfect fit is indicated by an SSE SSE = 0.= 0. •• The magnitude of The magnitude of SSESSE depends on depends on nn and on the units of and on the units of measurement.measurement.measurement.measurement. 12-24 C ha 12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares (OLS) pter 1 12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least Squares (OLS) FormulasFormulas Coefficient of DeterminationCoefficient of Determination RR22 i fi f l ti fitl ti fit b d i fb d i f SSRSSR dd 12 •• RR22 is a measure of is a measure of relative fitrelative fit based on a comparison of based on a comparison of SSR SSR and and SSTSST.. •• Often expressed as a percent, an Often expressed as a percent, an RR22 = 1 (i.e., 100%) indicates = 1 (i.e., 100%) indicates
  • 21. perfect fit.perfect fit. •• In simple regression, In simple regression, RR2 2 = (= (rr))22 12-25 C hapter 1 12.5 Test For Significance12.5 Test For SignificanceLO12LO12--55 12 LO12LO12--5: 5: Calculate and interpret confidence intervals for regressionCalculate and interpret confidence intervals for regression coefficients.coefficients. •• TheThe standard errorstandard error ((ss) is an overall measure of model fit) is an overall measure of model fit Standard Error of RegressionStandard Error of Regression •• The The standard errorstandard error ((ss) is an overall measure of model fit.) is an overall measure of model fit. •• If the fitted model’s predictions are perfect If the fitted model’s predictions are perfect e ed ode s p ed c o s a e pe ece ed ode s p ed c o s a e pe ec ((SSESSE = 0), then = 0), then ss = 0. Thus, a small = 0. Thus, a small ss indicates a better fit.indicates a better fit. •• Used to construct confidence intervals. Used to construct confidence intervals. •• Magnitude of Magnitude of ss depends on the units of
  • 22. measurement of depends on the units of measurement of YY and on and on data magnitude.data magnitude. 12-26 C hapter 1 12.5 Test For Significance12.5 Test For SignificanceLO12LO12--55 •• Standard error of the slope and intercept:Standard error of the slope and intercept: Confidence Intervals for Slope and InterceptConfidence Intervals for Slope and Intercept 12 •• Standard error of the slope and intercept:Standard error of the slope and intercept: 12-27 C hapter 1 12.5 Test For Significance12.5 Test For SignificanceLO12LO12--55 Confidence Intervals for Slope and InterceptConfidence
  • 23. Intervals for Slope and Intercept 12 •• Confidence interval for the true slope and interceptConfidence interval for the true slope and intercept:: •• Note: One can use Excel, Minitab, MegaStat or other software to compute these intervalsp and do hypothesis tests relating to linear regression. 12-28 C hapter 1 12.5 Test For Significance12.5 Test For SignificanceLO12LO12--66 12 LO12LO12--6: 6: Test hypotheses about the slope and intercept by using Test hypotheses about the slope and intercept by using t tests.t tests. •• If If influence YY and the regression model and the regression model Hypothesis TestsHypothesis Tests random error.plus random error. •• The hypotheses to be tested are:The hypotheses to be tested are:The hypotheses to be tested are:The hypotheses to be tested
  • 24. are: df = n -2 or if or if pp-- 12-29 C ha 12 6 A l i f V i O ll Fit12 6 A l i f V i O ll Fit pter 1 12.6 Analysis of Variance: Overall Fit12.6 Analysis of Variance: Overall FitLO12LO12--88 12 LO12LO12--8: 8: Interpret the standard error, Interpret the standard error, RR22, ANOVA table, , ANOVA table, and and F test.F test. • To test a regression for overall significance, we use an F test to F F Test for Overall FitTest for Overall Fit g g , compare the explained (SSR) and unexplained (SSE) sums of squares. 12-30
  • 25. 12 7 Confidence12 7 Confidence and Predictionand Prediction C ha 12.7 Confidence 12.7 Confidence and Prediction and Prediction Intervals for Intervals for YY pter 1 LO12LO12--99 H t C t t I t l E ti t f YH t C t t I t l E ti t f Y 12 LO12LO12--9:9: Distinguish between confidence and prediction intervals for Y.Distinguish between confidence and prediction intervals for Y. C fid I t l f th diti l fditi l f YY How to Construct an Interval Estimate for YHow to Construct an Interval Estimate for Y • Confidence Interval for the conditional mean of conditional mean of Y.Y. • Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean off YYindividual Y values vary more than the mean of f YY.. 12-31
  • 26. 12 8 R id l T t12 8 R id l T t C ha 12.8 Residual Tests12.8 Residual Tests pter 1 LO12LO12--1010 12 LO12LO12--10: 10: Test residuals for violations of regression assumptions.Test residuals for violations of regression assumptions. Three Important AssumptionsThree Important Assumptions 11 The errors are normally distributedThe errors are normally distributed1.1. The errors are normally distributed.The errors are normally distributed. 2.2. The errors have constant variance (i.e., they are The errors have constant variance (i.e., they are homoscedastichomoscedastic).). 33 The errors are independent (i e they areThe errors are independent (i e they are nonautocorrelatednonautocorrelated))3.3. The errors are independent (i.e., they are The errors are independent (i.e., they are nonautocorrelatednonautocorrelated).). NonNon--normal Errorsnormal Errors •• NonNon--normalitynormality of errors is a mild violation since the regression of errors is a mild violation since the regression parameter estimates parameter estimates bb00 and and bb11 and
  • 27. their variances remain and their variances remain bi d d i t tbi d d i t tunbiased and consistent.unbiased and consistent. •• Confidence intervals for the parameters may be untrustworthy Confidence intervals for the parameters may be untrustworthy because normality assumption is used to justify usingbecause normality assumption is used to justify using 12-32 because normality assumption is used to justify using because normality assumption is used to justify using Student’s Student’s tt distribution.distribution. C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 NonNon--normal Errorsnormal Errors A l l i ld tA l l i ld t 12 •• A large sample size would compensate.A large sample size would compensate. •• Outliers could pose serious problemsOutliers could pose serious problems..
  • 28. Normal Probability PlotNormal Probability Plot •• The The Normal Probability PlotNormal Probability Plot tests the assumptiontests the assumption HH00: Errors are normally distributed: Errors are normally distributed HH : Errors are not normally distributed: Errors are not normally distributedHH11: Errors are not normally distributed: Errors are not normally distributed •• If If HH00 is true, the is true, the residual probability residual probability p yp y plot should be linear plot should be linear as shown in the as shown in the example.example. 12-33 C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 What to Do About NonWhat to Do About Non-- Normality?Normality? 12 1.1. Trim outliers only if they clearly are mistakes.Trim outliers only if they clearly are mistakes.
  • 29. 2.2. Increase the sample size if possible.Increase the sample size if possible. 3.3. Try a logarithmic transformation of both Try a logarithmic transformation of both XX and and YY.. Heteroscedastic Errors (NonHeteroscedastic Errors (Non-- constant Variance)constant Variance)(( )) •• The ideal condition is if the error magnitude is constant (i.e., The ideal condition is if the error magnitude is constant (i.e., errors are errors are homoscedastichomoscedastic).). 12-34 )) C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 Heteroscedastic Errors (NonHeteroscedastic Errors (Non-- constant Variance)constant Variance) 12 •• HeteroscedasticHeteroscedastic errors increase or decrease with errors increase or decrease with XX.. •• In the most common form ofIn the most common form of heteroscedasticityheteroscedasticity the variances of thethe variances of theIn the most common form of In the most
  • 30. common form of heteroscedasticityheteroscedasticity, the variances of the , the variances of the estimators are likely to be understated.estimators are likely to be understated. •• This results in overstated This results in overstated tt statistics and artificially narrow statistics and artificially narrow yy confidence intervals.confidence intervals. Tests for HeteroscedasticityTests for HeteroscedasticityTests for HeteroscedasticityTests for Heteroscedasticity •• Plot the residuals against Plot the residuals against XX. . gg Ideally, there is no pattern in the Ideally, there is no pattern in the residuals moving from left to right.residuals moving from left to right. 12-35 C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 Tests for HeteroscedasticityTests for Heteroscedasticity Th “fTh “f t” tt f i i id l i i th tt” tt f i i id l i i th t 12
  • 31. •• The “fanThe “fan--out” pattern of increasing residual variance is the most out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.common pattern indicating heteroscedasticity. 12-36 C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 What to Do About Heteroscedasticity?What to Do About Heteroscedasticity? 12 •• Transform both Transform both XX and and YY, for example, by taking logs., for example, by taking logs. •• Although it can widen the confidence intervals for the coefficients, Although it can widen the confidence intervals for the coefficients, heteroscedasticity does not bias the estimates.heteroscedasticity does not bias the estimates. Autocorrelated ErrorsAutocorrelated ErrorsAutocorrelated ErrorsAutocorrelated Errors
  • 32. • Autocorrelation is a pattern of non-independent errors. • In a first-order autocorrelation, et is correlated with et-1. • The estimated variances of the OLS estimators are biased, resulting in confidence intervals that are too narrow, overstating the model’s fit. 12-37 C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 Runs Test for AutocorrelationRuns Test for Autocorrelation I thI th t tt t t th b f th id l’ i l (i ht th b f th id l’ i l (i h 12 •• In the In the runs testruns test, count the number of the residual’s sign reversals (i.e., how , count the number of the residual’s sign reversals (i.e., how often does the residual cross the zero centerline?).often does the residual cross the zero centerline?). •• If the pattern is random, the number of sign changes should be If the pattern is random, the number of sign changes should be n/2n/2. . p , g gp , g g •• Fewer than Fewer than n/2n/2 would suggest positive autocorrelation.would suggest positive autocorrelation.
  • 33. •• More than More than n/2n/2 would suggest negative autocorrelation.would suggest negative autocorrelation. DurbinDurbin--Watson (DW) TestWatson (DW) Test • Tests for autocorrelation under the hypotheses H0: Errors are non-autocorrelated H : Errors are autocorrelatedH1: Errors are autocorrelated • The DW statistic will range from 0 to 4. DW < 2 suggests positive autocorrelation 12-38 DW = 2 suggests no autocorrelation (ideal) DW > 2 suggests negative autocorrelation C ha 12 8 R id l T t12 8 R id l T t pter 1 12.8 Residual Tests12.8 Residual TestsLO12LO12--1010 What to Do About Autocorrelation?What to Do About Autocorrelation? T f b th i bl i thT f b th i bl i th th d f fi t diffth d f fi t diff ii 12 •• Transform both variables using the Transform both variables using the method of first differencesmethod of first differences in in
  • 34. which both variables are redefined as which both variables are redefined as changeschanges. . Then we regress Y against X.against X. •• Although it can widen the confidence interval for the coefficients, Although it can widen the confidence interval for the coefficients, autocorrelation does not bias the estimates.autocorrelation does not bias the estimates.au oco e a o does o b as e es a esau oco e a o does o b as e es a es 12-39 12 9 U l12 9 U l Ob tiOb ti C ha 12.9 Unusual 12.9 Unusual ObservationsObservations pter 1 LO12LO12--1111 12 LO12LO12--11: 11: Identify unusual residuals and high leverage observations.Identify unusual residuals and high leverage observations. Standardized ResidualsStandardized Residuals • One can use Excel Minitab MegaStat or other software to compute• One can use Excel, Minitab, MegaStat or other software to compute standardized residuals.
  • 35. • If the absolute value of any standardized residual is at least 2, then it is y , classified as unusual. Leverage and InfluenceLeverage and Influencegg •• A high A high leverageleverage statistic indicates the observation is far from the statistic indicates the observation is far from the mean of mean of XX. . •• These observations are influential because they are at the “ end These observations are influential because they are at the “ end of the lever.”of the lever.” 12-40 •• The leverage for observation The leverage for observation ii is denoted is denoted hhii .. C ha 12 9 U l Ob ti12 9 U l Ob ti pter 1 12.9 Unusual Observations12.9 Unusual ObservationsLO12LO12--1111 Leverage Leverage 12
  • 36. • A leverage that exceeds 3/n is unusual.g 12-41 C ha 12.10 Other 12.10 Other Regression ProblemsRegression Problems pter 1 O tliO tli 12 OutliersOutliers To fix the problem, To fix the problem, -- delete the observation(s)delete the observation(s) d l t th d td l t th d t Outliers may be caused byOutliers may be caused by -- an error in recordingan error in recording -- delete the datadelete the data -- formulate a multiple regression formulate a multiple regression model that includes the lurking model that includes the lurking datadata -- impossible data impossible data -- an observation that hasan observation that has ode a c udes e
  • 37. u gode a c udes e u g variable.variable. -- an observation that hasan observation that has been influenced by an been influenced by an unspecified “lurking”unspecified “lurking” variable that shouldvariable that should have been controlledhave been controlled but wasn’tbut wasn’t 12-42 12B-42 but wasn t.but wasn t. C hapter 1 12.10 Other Regression Problems12.10 Other Regression Problems Model MisspecificationModel Misspecification If l t di t h b itt d th th d l i 12 • If a relevant predictor has been omitted, then the model is misspecified. • Use multiple regression instead of bivariate regression• Use multiple regression instead of bivariate regression. IllIll Conditioned DataConditioned DataIllIll--Conditioned DataConditioned Data
  • 38. • Well-conditioned data values are of the same general order of magnitude. • Ill conditioned data have unusually large or small data values and• Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates. 12-43 C hapter 1 12.10 Other Regression Problems12.10 Other Regression Problems IllIll--Conditioned DataConditioned Data A id i i it d b dj ti th it d f d t 12 • Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression.. Spurious CorrelationSpurious Correlation • In a spurious correlation two variables appear related because of the way they are defined. This problem is called the si e effect or problem of totals• This problem is called the size effect or problem of totals.
  • 39. 12-44 C hapter 1 12.10 Other Regression Problems12.10 Other Regression Problems Model Form and Variable TransformsModel Form and Variable Transforms S ti li d l i b tt fit th li d lS ti li d l i b tt fit th li d l 12 •• Sometimes a nonlinear model is a better fit than a linear model. Sometimes a nonlinear model is a better fit than a linear model. •• Excel offers many model forms.Excel offers many model forms. Variables may be transformed (e g logarithmic or exponentialVariables may be transformed (e g logarithmic or exponential•• Variables may be transformed (e.g., logarithmic or exponential Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit.functions) in order to provide a better fit. •• Log transformations reduce heteroscedasticityLog transformations reduce heteroscedasticityLog transformations reduce heteroscedasticity.Log transformations reduce heteroscedasticity.
  • 40. •• Nonlinear models may be difficult to interpretNonlinear models may be difficult to interpret.. 12-45 C hapter 112 12-46 C hapter 112 12-47 SS C ha TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests pter Chapter ContentsChapter Contents 10 10.1 Two10.1 Two--Sample TestsSample Tests 10.2 Comparing Two Means: Independent Samples10.2
  • 41. Comparing Two Means: Independent Samples 10.3 Confidence Interval for the Difference of Two Means, 10.3 Confidence Interval for the Difference of Two Means, -- 10.4 Comparing Two Means: Paired Samples10.4 Comparing Two Means: Paired Samples 10.5 Comparing Two Proportions10.5 Comparing Two Proportions 10.6 Confidence Interval for the Difference of Two Proportions, 10.6 Confidence Interval for the Difference of Two Prop -- 10.7 Comparing Two Variances10.7 Comparing Two Variances 10-1 C ha SS pter 1 TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 10 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO10LO10 1:1: R i d f t t f t ith kR i d f t t f t ith kLO10LO10-- 1: 1: Recognize and perform a test for two means with known Recognize and perform a test for two means with known
  • 42. LO10LO10 22LO10LO10--2: 2: Recognize and perform a test for two means with unknown Recognize and perform a test for two means with unknown LO10LO10--3:3: Recognize paired data and be able to perform a paired Recognize paired data and be able to perform a paired t test.t test. LO10LO10--4: 4: Explain the assumptions underlying the twoExplain the assumptions underlying the two--sample test of sample test of means. means. LO10LO10--5:5: Perform a test to compare two proportions using Perform a test to compare two proportions using z.z. 10-2 C ha SS pter TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 10
  • 43. Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO10LO10--6: 6: Check whether normality may be assumed for two Check whether normality may be assumed for two proportions.proportions. LO10LO10--7: 7: Use Excel to find Use Excel to find pp-- values for twovalues for two--sample tests using sample tests using z z oror t.t. LO10LO10--8: 8: Carry out a test of two variances using the Carry out a test of two variances using the F F distribution.distribution.y gy g LO10LO10--99: Construct a confidence interval for Construct a confidence interval for μμ11− − μμ22 or or ππ11− − ππ22 ((optional).optional).((optional).optional). 10-3 C ha SS pter 10.1 Two10.1 Two--Sample TestsSample Tests •• A TwoA Two--sample test compares two sample estimates with eachsample test compares two sample estimates with each What is a TwoWhat is a Two--Sample TestSample Test 10
  • 44. •• A TwoA Two--sample test compares two sample estimates with each sample test compares two sample estimates with each other.other. •• A oneA one--sample test compares a sample estimate to a nonsample test compares a sample estimate to a non--sample sample p p pp p p pp benchmark.benchmark. Basis of TwoBasis of Two--Sample TestsSample TestsBasis of TwoBasis of Two Sample TestsSample Tests • Two-sample tests are especially useful because they possess a built-in point of comparison. •• The logic of twoThe logic of two--sample tests is based on the fact that two sample tests is based on the fact that two l d f th l ti i ld diff tl d f th l ti i ld diff tsamples drawn from the same population may yield different samples drawn from the same population may yield different estimates of a parameter due to chance.estimates of a parameter due to chance. 10-4 C ha SS pter 10.1 Two10.1 Two--Sample TestsSample Tests
  • 45. •• If the two sample statistics differ by more than the amountIf the two sample statistics differ by more than the amount What is a TwoWhat is a Two--Sample TestSample Test 10 •• If the two sample statistics differ by more than the amount If the two sample statistics differ by more than the amount attributable to chance, then we conclude that the samples came attributable to chance, then we conclude that the samples came from populations with different parameter values.from populations with different parameter values. 10-5 C ha SS pter 10.1 Two10.1 Two--Sample TestsSample Tests Test ProcedureTest Procedure 10 •• State the hypothesesState the hypotheses •• Set up the decision ruleSet up the decision rule •• Insert the sample statisticsInsert the sample statistics •• Make a decision based on the critical values or using Make a decision based on the critical values or using pp--valuesvalues 10-6
  • 46. C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples LO10LO10--11 10 pp LO10LO10--1: 1: Recognize and perform a test for two means with known Recognize and perform a test for two means with known σσ11 andand σσ22 Format of HypothesesFormat of Hypotheses σσ11 and and σσ22.. • The hypotheses for comparing two independent population ypyp yp p g p p p means µ1 and µ2 are: 10-7 C
  • 47. hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples LO10LO10--11 Case 1: Known VariancesCase 1: Known Variances 10 pp •• When the variances are known, use the normal distribution for the When the variances are known, use the normal distribution for the Case 1: Known VariancesCase 1: Known Variances ,, test (assuming a normal population).test (assuming a normal population). •• The test statistic is:The test statistic is: 10-8 C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples
  • 48. LO10LO10--22 10 LO10LO10--2: 2: Recognize and perform a test for two means with unknown Recognize and perform a test for two means with unknown σσ11 andand σσ22 pp Case 2: Unknown Variances, Assumed EqualCase 2: Unknown Variances, Assumed Equal σσ11 and and σσ22.. •• Since the variances are unknown, they must be estimated Since the variances are unknown, they must be estimated and the Student’s and the Student’s tt distribution used to test the means.distribution used to test the means. •• Assuming the population variances are equal, Assuming the population variances are equal, ss1122 and and ss2222 can be used to estimate a common pooled variance can be used to estimate a common pooled variance sspp22.. 10-9 C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples
  • 49. LO10LO10--22 Case 3: Unknown Variances, Assumed UnequalCase 3: Unknown Variances, Assumed Unequal 10 pp 10-10 C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples LO10LO10--22 Case 3: Unknown Variances, Assumed UnequalCase 3: Unknown Variances, Assumed Unequal 10 pp •• WelchWelch--Satterthwaite testSatterthwaite test •• A Quick Rule for degrees of freedom is to use min(A Quick Rule for degrees of freedom is to use min(nn11 –– 1, 1, nn22 –– 1). 1). 10-11
  • 50. C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples If th l ti i 2 d 2 k th th Summary for the Test StatisticSummary for the Test Statistic 10 pp the normal distribution. • If population variances are unknown and estimated using s12 andIf population variances are unknown and estimated using s1 and s22, then use the Students t distribution. 10-12 C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent
  • 51. SamplesSamples Steps in Testing Two MeansSteps in Testing Two Means 10 pp • Step 1: State the hypotheses • Step 2: Specify the decision rulep p y value(s). • Step 3: Calculate the Test Statistic •• Step 4Step 4: : Make the decision Reject Make the decision Reject HH00 if the test statistic falls in the if the test statistic falls in the pp jj 00 rejection region(s) as defined by the critical value(s).rejection region(s) as defined by the critical value(s). • Step 5: : Take action based on the decision. p 10-13 C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples If th l i l thIf th l i l th C 2C 2 dd C 3C 3 t tt t
  • 52. Which Assumption Is Best?Which Assumption Is Best? 10 pp •• If the sample sizes are equal, the If the sample sizes are equal, the Case 2Case 2 and and Case 3Case 3 test test statistics will be identical, although the degrees of freedom may statistics will be identical, although the degrees of freedom may differ.differ. •• If the variances are similar, the two tests will usually agree.If the variances are similar, the two tests will usually agree. •• If no information about the population variances is available, then If no information about the population variances is available, then p p ,p p , the best choice is the best choice is Case 3Case 3.. •• The fewer assumptions, the better.The fewer assumptions, the better. Must Sample Sizes Be Equal?Must Sample Sizes Be Equal? •• Unequal sample sizes are common and the formulas still applyUnequal sample sizes are common and the formulas still apply•• Unequal sample sizes are common and the formulas still apply.Unequal sample sizes are common and the formulas still apply. 10-14 C hapter
  • 53. 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples Large SamplesLarge Samples 10 pp •• For unknown variances, if both samples are large (For 30 and following 30) and the population is not badly skewed, use the following formula with appendix C.formula with appendix C.pppp Caution: Three IssuesCaution: Three IssuesCaution: Three IssuesCaution: Three Issues 1.1. Are the populations skewed? Are there outliers? Are the populations skewed? Are there outliers? Check using histograms and/or dot plots of each sample. Check using histograms and/or dot plots of each sample. tt tests are OK if moderately skewed, especially if samples are tests are OK if moderately skewed, especially if samples are 10-15 y , p y py , p y p large. Outliers are more serious.large. Outliers are more serious.
  • 54. C hapter 10.2 Comparing Two Means: Independent 10.2 Comparing Two Means: Independent SamplesSamples Caution: Three IssuesCaution: Three Issues 22 Are the sample sizes largeAre the sample sizes large (n(n 10 pp 2.2. Are the sample sizes large Are the sample sizes large (n(n If samples are small, the mean is not a reliable indicator of central If samples are small, the mean is not a reliable indicator of central tendency and the test may lack powertendency and the test may lack powertendency and the test may lack power.tendency and the test may lack power. 3.3. Is the difference Is the difference important important as well as significant?as well as significant? A ll diff i ti ld b i ifi t ifA ll diff i ti ld b i ifi t ifA small difference in means or proportions could be significant if A small difference in means or proportions could be significant if the sample size is large.the sample size is large. 10-16
  • 55. C ha 10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f pter 10.3 Confidence Interval for the Difference of 10.3 Confidence Interval for the Difference of -- LO10LO10--99 10 LO10LO10--9: 9: Construct a confidence interval for Construct -- ((optional)optional) Confidence Intervals for the Difference of Two Means 10-17 C ha 10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f pter 10.3 Confidence Interval for the Difference of 10.3 Confidence Interval for the Difference of --
  • 56. LO10LO10--99 10 LO10LO10--9: 9: Construct a confidence interval for Construct -- ((optional)optional) 10-18 C ha 10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f pter 10.3 Confidence Interval for the Difference of 10.3 Confidence Interval for the Difference of -- LO10LO10--99 10 LO10LO10--9: 9: Construct a confidence interval for Construct -- ((optional)optional) 10-19 C ha 10 4 Comparing Two Means:10 4 Comparing Two Means: pter
  • 57. LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing Two Means: Paired SamplesPaired Samples 10 LO10LO10--3: 3: Recognize paired data and be able to perform a paired Recognize paired data and be able to perform a paired t test.t test. Paired DataPaired Data •• Data occurs in matched pairs when the same item is observed Data occurs in matched pairs when the same item is observed twice but under different circumstances.twice but under different circumstances. •• For example blood pressure is taken before and after a treatmentFor example blood pressure is taken before and after a treatmentFor example, blood pressure is taken before and after a treatment For example, blood pressure is taken before and after a treatment is given.is given. •• Paired data are typically displayed in columns.Paired data are typically displayed in columns. 10-20 C ha 10 4 Comparing Two Means:10 4 Comparing Two Means: pter
  • 58. LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing Two Means: Paired SamplesPaired Samples Paired t TestPaired t Test •• Paired data typically come from a before/after experimentPaired data typically come from a before/after experiment 10 •• Paired data typically come from a before/after experiment.Paired data typically come from a before/after experiment. •• In the paired In the paired tt test, the difference between test, the difference between xx11 and and xx22 is measured is measured asas dd == xx11 –– xx22as as dd xx11 xx22 •• The mean and standard deviation for the differences d are given The mean and standard deviation for the differences d are given below.below. Th t t t ti ti i j t fTh t t t ti ti i j t f l tl t t tt t•• The test statistic is just for a oneThe test statistic is just for a one--sample tsample t--test.test. 10-21 C ha
  • 59. 10 4 Comparing Two Means:10 4 Comparing Two Means: pter 10.4 Comparing Two Means: 10.4 Comparing Two Means: Paired SamplesPaired Samples LO10LO10--33 St 1 St t th h th f l Steps in Testing Paired DataSteps in Testing Paired Data 10 • Step 1: State the hypotheses, for example H0: µd = 0 H1: µd ≠ 01 µd • Step 2: Specify the decision rule. determine the critical values from Appendix D or with use of technology. St 3 C l l t th t t t ti tiSt 3 C l l t th t t t ti ti tt•• Step 3: Calculate the test statistic Step 3: Calculate the test statistic tt •• Step 4: Make the decisionStep 4: Make the decision Reject Reject HH00 if the test statistic falls in the rejection region(s) as if the test statistic falls in the rejection region(s) as jj 00 j g ( )j g ( ) defined by the critical valuesdefined by the critical values 10-22 C hapter
  • 60. 10.4 Comparing Two Means: 10.4 Comparing Two Means: Paired SamplesPaired Samples LO10LO10--33 A two tailed test for a zero difference is equivalent to asking Analogy to Confidence IntervalAnalogy to Confidence Interval 10 pp • A two-tailed test for a zero difference is equivalent to asking whether the confidence interval for the true mean difference µd includes zero. 10-23 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--55 10 LO10LO10--5: 5: Perform a test to compare two proportions using Perform a test to compare two proportions using z.z. --
  • 61. following hypotheses 10-24 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--55 10 -- Sample ProportionsSample Proportions 10-25 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--55 10 --
  • 62. • If H0 is true, there is no difference between Pooled ProportionPooled Proportion 0 , estimate the common population proportion. 10-26 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--55 T t St ti tiT t St ti ti 10 Testing for Zero -- If th l l b d ll Test StatisticTest Statistic • If the samples are large, p1 – p2 may be assumed normally distributed. • The test statistic is the difference of the sample proportionsThe test statistic is the difference of the sample proportions divided by the standard error of the difference.
  • 63. • The standard error is calculated by using the pooled proportion.y g p p p - 10-27 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--55 10 -- Steps in Testing Two ProportionsSteps in Testing Two Proportions • Step 1: State the hypotheses • Step 2: Specify the decision rulep p y value(s). use a pooled estimate of the common proportion. •• Step 4: Make the decision RejectStep 4: Make the decision Reject HH if the test statistic falls in theif the test statistic falls in the•• Step 4: Make the decision Reject Step 4: Make the decision Reject HH00 if the test statistic falls in the if the test
  • 64. statistic falls in the rejection region(s) as defined by the critical value(s).rejection region(s) as defined by the critical value(s). 10-28 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two ProportionsLO10LO10--66 10 LO10LO10--6: 6: Check whether normality may be assumed for two proportions.Check whether normality may be assumed for two proportions. Testing for Zero Difference: Testing for Zero Difference: -- • We have assumed a normal distribution for the statistic p1 – p2. Checking for NormalityChecking for Normality p1 p2 • This assumption can be checked. place If ith l ti i t l th i diff t• If either sample proportion is not normal, their difference cannot
  • 65. safely be assumed normal. • The sample size rule of thumb is equivalent to requiring that each e sa p e s e u e o u b s equ a e o equ g a eac sample contains at least 10 “successes” and at least 10 “failures.” 10-29 C hapter 10.5 Comparing Two Proportions10.5 Comparing Two Proportions 10 Testing for NonTesting for Non--Zero DifferenceZero Difference 10-30 C ha 10 6 C fid I t l f th Diff10 6 C fid I t l f th Diff pter 10.6 Confidence Interval for the Difference 10.6 Confidence Interval for the Difference -- •• If the confidence interval does not include 0, then we will
  • 66. reject If the confidence interval does not include 0, then we will reject the null hypothesis of no difference in the proportions.the null hypothesis of no difference in the proportions.e u ypo es s o o d e e ce e p opo o se u ypo es s o o d e e ce e p opo o s 10-31 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 F t f H thF t f H th 10LO10LO10--8: 8: Carry out a test of two variances using the Carry out a test of two variances using the F F distributiondistribution •• To test whether two population means are equal, we may also To test whether two population means are equal, we may also need to test whether two population variances are equalneed to test whether two population variances are equal Format of HypothesesFormat of Hypotheses need to test whether two population variances are equal.need to test whether two population variances are equal. 10-32
  • 67. C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 •• The test statistic is the ratio of the sample variances:The test statistic is the ratio of the sample variances: The F TestThe F Test 10 •• The test statistic is the ratio of the sample variances:The test statistic is the ratio of the sample variances: • If the variances are equal, this ratio should be near unity: F = 1 10-33 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 The F TestThe F Test 10 • If the test statistic is far below 1 or above 1, we would reject the hypothesis of equal population variances.
  • 68. • The numerator s12 has degrees of freedom df1 = n1 – 1 and the denominator s22 has degrees of freedom df2 = n2 – 1. • The F distribution is skewed with the mean > 1 and its mode < 1• The F distribution is skewed with the mean > 1 and its mode < 1. 10-34 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 The F Test: Critical ValuesThe F Test: Critical Values 10 • Critical values for the F test are denoted FL (left tail) and FR (right tail). L ( ) R ( g ) • A right-tail critical value FR may be found from Appendix F using df1 and df2 degrees of freedom. FR = Fdf1, df2 • A left-tail critical value FR may be found by reversing the d d i d f f d fi di hnumerator and denominator degrees of freedom, finding the critical value from Appendix F and taking its reciprocal:
  • 69. F = 1/FFL = 1/Fdf2, df1 10-35 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 The F Test: Critical ValuesThe F Test: Critical Values 10 10-36 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 Steps in Testing Two VariancesSteps in Testing Two Variances 10 • Step 1: State the hypotheses, for example • Step 2: Specify the decision rule D f f dDegrees of freedom are:
  • 70. Numerator: df1 = n1 – 1 Denominator: df2 = n2 – 1 2 2 Choose a and find the left-tail and right-tail critical values from Appendix F. 10-37 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 Steps in Testing Two VariancesSteps in Testing Two Variances 10 •• Step 3: Calculate the test statistic Step 3: Calculate the test statistic FFcalccalc = = ss1122//ss2222.. •• Step 4: Make the decisionStep 4: Make the decision Reject Reject HH00 if the test statistic falls in the rejection regions as if the test statistic falls in the rejection regions as defined by the critical values defined by the critical values FFLL and and FFUU.. 10-38 C hapter 10.7 Comparing Two Variances10.7 Comparing Two
  • 71. VariancesLO10LO10--88 Comparison of Variances: One Tailed TestComparison of Variances: One Tailed Test 10 • Step 1: State the hypotheses, for example • Step 2: State the decision rulep Degrees of freedom are: Numerator: df1 = n1 – 1 D i t df 1Denominator: df2 = n2 – 1 Choose a and find the left-tail critical value from Appendix F. 10-39 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 Comparison of Variances: One Tailed TestComparison of Variances: One Tailed Test 10 •• Step 3: Calculate the Test Statistic Step 3: Calculate the Test Statistic FFcalccalc = = ss1122//ss2222.. •• Step 4: Make the decisionStep 4: Make the decisionStep 4: Make the decisionStep 4: Make the decision
  • 72. Reject Reject HH00 if the test statistic falls in the leftif the test statistic falls in the left--tail rejection region as tail rejection region as defined by the critical value.defined by the critical value. 10-40 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 EXCEL’s F TestEXCEL’s F Test 10 10-41 C hapter 10.7 Comparing Two Variances10.7 Comparing Two VariancesLO10LO10--88 Assumptions of the F TestAssumptions of the F Test •• TheThe FF test assumes that the populations being sampled aretest assumes that the populations being sampled are 10
  • 73. •• The The FF test assumes that the populations being sampled are test assumes that the populations being sampled are normal.normal. •• It is sensitive to nonIt is sensitive to non--normality of the sampled populations.normality of the sampled populations.y p p py p p p •• MINITAB reports both the MINITAB reports both the FF test and an alternative test and an alternative Levene’s testLevene’s test and and pp--values.values. 10-42 SS C ha Sampling Distributions and EstimationSampling Distributions and Estimation pter Chapter ContentsChapter Contents 8 8.1 Sampling Variation 8 2 E ti t d S li E8.2 Estimators and Sampling Errors 8.3 Sample Mean and the Central Limit Theorem 8 4 Confidence Interval for a Mean (μ) with Known σ8.4
  • 74. Confidence Interval for a Mean (μ) with Known σ 8.5 Confidence Interval for a Mean (μ) with Unknown σ 8 6 Confidence Interval for a Proportion (π)8.6 Confidence Interval for a Proportion (π) 8.7 Estimating from Finite Populations 8 8 Sample Size Determination for a Mean8.8 Sample Size Determination for a Mean 8.9 Sample Size Determination for a Proportion 8.10 Confidence Interval for a Populati (Optional) 8-1 (Optional) C ha SS pter Sampling Distributions and EstimationSampling Distributions and Estimation Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 8 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO8LO8 11LO8LO8--1: 1: Define sampling error, parameter,
  • 75. and estimator.Define sampling error, parameter, and estimator. LO8LO8--2: 2: Explain the desirable properties of estimators.Explain the desirable properties of estimators. LO8LO8--3:3: State the Central Limit Theorem for a mean.State the Central Limit Theorem for a mean. LO8LO8--4:4: Explain how sample size affects the standard error.Explain how sample size affects the standard error.LO8LO8 4:4: Explain how sample size affects the standard error.Explain how sample size affects the standard error. LO8LO8--5:5: Construct a 90, 95, or 99 percent confidence interval for Construct a 90, 95, or 99 percent confidence interval for μ.μ. 8-2 C ha SS pter Sampling Distributions and EstimationSampling Distributions and Estimation Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 8 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO8LO8 66LO8LO8--6:6: Know when to use Student’s Know when to use Student’s t t instead of instead of z z to estimate to
  • 76. estimate μ.μ. LO8LO8--7:7: Construct a 90, 95, or 99 percent confidence interval for Construct a 90, 95, or 99 percent confidence interval for π.π. LO8LO8--8:8: Construct confidence intervals for finite populations.Construct confidence intervals for finite populations. LO8LO8--9:9: Calculate sample size to estimate a mean or proportion.Calculate sample size to estimate a mean or proportion.LO8LO8 9:9: Calculate sample size to estimate a mean or proportion.Calculate sample size to estimate a mean or proportion. LO8LO8--10: 10: Construct a confidence interval for a variance (optional).Construct a confidence interval for a variance (optional). 8-3 C ha 8 1 S li V i ti8 1 S li V i ti pter 8.1 Sampling Variation8.1 Sampling Variation • Sample statistic – a random variable whose value depends on 8 which population items are included in the random sample. • Depending on the sample size, the sample statistic could either represent the pop lation ell or differ greatl from the pop
  • 77. lationrepresent the population well or differ greatly from the population. • This sampling variation can easily be illustrated. 8-4 C ha 8 1 S li V i ti8 1 S li V i ti pter 8.1 Sampling Variation8.1 Sampling Variation 8 C id i ht d l f iC id i ht d l f i 5 f l5 f l•• Consider eight random samples of size Consider eight random samples of size nn = 5 from a large = 5 from a large population of GMAT scores for MBA applicants.population of GMAT scores for MBA applicants. •• The sample means tend to be close to the population mean The sample means tend to be close to the population mean 8-5 C ha 8 1 S li V i ti8 1 S li V i ti pter
  • 78. 8.1 Sampling Variation8.1 Sampling Variation •• The dot plots show that the sample The dot plots show that the sample meansmeans have much less variation have much less variation than thethan the individualindividual sample items.sample items. 8 than the than the individualindividual sample items. sample items. 8-6 C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and Sampling DistributionsLO8LO8--11 8 LO8LO8--1: 1: Define sampling error, parameter and estimator.Define sampling error, parameter and estimator. E ti tE ti t t ti ti d i d f l t i f th l ft ti ti d i d f l t i f th l f Some TerminologySome Terminology •• EstimatorEstimator –– a statistic derived from a sample to infer the value of a a statistic derived from a sample to infer the
  • 79. value of a population parameter.population parameter. •• EstimateEstimate –– the value of the estimator in a particular samplethe value of the estimator in a particular sampleEstimateEstimate –– the value of the estimator in a particular sample.the value of the estimator in a particular sample. •• Population parameters are usually represented by Population parameters are usually represented by Greek letters and the corresponding statistic Greek letters and the corresponding statistic p gp g by Roman letters.by Roman letters. 8-7 C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and Sampling DistributionsLO8LO8--11 Examples of EstimatorsExamples of Estimators 8 Sampling DistributionsSampling Distributions • The sampling distribution of an estimator is the probability distribution of
  • 80. all possible values the statistic may assume when a random sample of size n is taken. 8-8 • Note: An estimator is a random variable since samples vary. C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and Sampling DistributionsLO8LO8--11 8 • Sampling errorSampling error is the difference between an estimate and the corresponding population parameter. For example, if we use the sample ti t f th l ti th th BiasBias mean as an estimate for the population mean, then the • Bias is the difference between the expected value of the estimator and the true parameter Example for the mean BiasBias
  • 81. the true parameter. Example for the mean, •• An estimator is An estimator is unbiasedunbiased if its expected value is the parameter being if its expected value is the parameter being estimated. The sample mean is an unbiased estimator of the population estimated. The sample mean is an unbiased estimator of the population iimean sincemean since •• On averageOn average an unbiased estimator neither overstates nor understatesan unbiased estimator neither overstates nor understatesOn averageOn average, an unbiased estimator neither overstates nor understates , an unbiased estimator neither overstates nor understates the true parameter.the true parameter. 8-9 C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and Sampling DistributionsLO8LO8--11 8 8-10
  • 82. C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and Sampling DistributionsLO8LO8--22 8 LO8LO8--2: 2: Explain the desirable properties of estimators.Explain the desirable properties of estimators. EfficiencyEfficiency Note: Also, a desirable property for an estimator is for it to be unbiased. •• EfficiencyEfficiency refers to the variance of the estimator’s sampling refers to the variance of the estimator’s sampling distribution.distribution. Fi 8 6•• A A more efficientmore efficient estimator has smaller variance.estimator has smaller variance.Figure 8.6 8-11 C ha 8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti pter 8.2 Estimators and Sampling Distributions8.2 Estimators and
  • 83. Sampling DistributionsLO8LO8--22 8 LO8LO8--2: 2: Explain the desirable properties of estimators.Explain the desirable properties of estimators. ConsistencyConsistency A consistent estimator converges toward the parameter being estimatedA consistent estimator converges toward the parameter being estimated as the sample size increases. Fi 8 6Figure 8.6 8-12 C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--33 8 LO8LO8--3: 3: State the Central Limit Theorem for a mean.State the Central Limit Theorem for a mean. The Central Limit Theorem is a powerful result that allows us to i t th h f th li di t ib ti f th lapproximate the shape of the sampling distribution of the sample mean even when we don’t know what the population looks like.
  • 84. 8-13 C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--33 •• If the population is exactly If the population is exactly normal, then the sample meannormal, then the sample mean 8 •• As the sample size As the sample size nn increases, the increases, the distribution of sample means narrowsdistribution of sample means narrowsnormal, then the sample mean normal, then the sample mean follows a normal distribution.follows a normal distribution. distribution of sample means narrows distribution of sample means narrows in on the population mean in on the population mean µµ.. 8-14 C ha
  • 85. 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--33 •• If the sample is large enough, the sample means will have If the sample is large enough, the sample means will have approximately a normal distribution even if your population is approximately a normal distribution even if your population is notnot 8 normal.normal. 8-15 C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--33 Illustrations of Central Limit Theorem Illustrations of Central Limit Theorem 8 Using the uniform
  • 86. and a right skewed di t ib ti Note: distribution. 8-16 C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--33 Th C t l Li it Th it t d fi i t l ithi hi h Applying The Central Limit TheoremApplying The Central Limit Theorem 8 The Central Limit Theorem permits us to define an interval within which the sample means are expected to fall. As long as the sample size n is large enough, we can use the normal distribution regardless of the population shape (or any n if the population is normal to begin with). 8-17
  • 87. C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit TheoremLO8LO8--44 8 LO8LO8--4: 4: Explain how sample size affects the standard error.Explain how sample size affects the standard error. Even if the population standard deviation σ is large, the sample means Sample Size and Standard ErrorSample Size and Standard Error p p g , p will fall within a narrow interval as long as n is large. The key is the standard error of the mean:.. The standard error decreases as n increasesincreases. For example, when n = 4 the standard error is halved. To halve it again requires n = 16, and to halve it again requires n = 64. To halve the standard error, you must quadruple the sample size (the law of diminishing returns). 8-18
  • 88. C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform PopulationIllustration: All Possible Samples from a Uniform Population 8 •• Consider a discrete uniform population consisting of the integers Consider a discrete uniform population consisting of the integers {0 1 2 3}{0 1 2 3}{0, 1, 2, 3}.{0, 1, 2, 3}. •• The population parameters are:The population parameters are: 1.118.= 1.118. 8-19 C ha 8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th pter
  • 89. 8.3 Sample Mean and the Central Limit Theorem8.3 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform PopulationIllustration: All Possible Samples from a Uniform Population 8 • The population is uniform, yet the distribution of all possible sample means of size 2 has a peaked triangular shapesample means of size 2 has a peaked triangular shape. 8-20 C ha 8 4 Confidence Interval for a Mean (8 4 Confidence Interval for ) with) with pter 8.4 Confidence Interval for a Mean (8.4 Confidence Interval --55 8 LO8LO8--5: 5: Construct a 90, 95, or 99 percent confidence interval for Construct a 90, 95, or 99 percent confidence interval for μ.μ. What is a Confidence Interval?What is a Confidence Interval? 8-21
  • 90. C ha 8 4 Confidence Interval for a Mean (8 4 Confidence Interval for 8.4 Confidence Interval for a Mean (8.4 Confidence Interval --55 What is a Confidence Interval?What is a Confidence Interval? with knownwith know 8 8-22 C ha 8 4 Confidence Interval for a Mean (8 4 Confidence Interval for 8.4 Confidence Interval for a Mean (8.4 Confidence Interval --55 A hi h fid l l l d t id fid i t lA hi h fid l l l d t id fid i t l Choosing a Confidence LevelChoosing a Confidence Level
  • 91. 8 •• A higher confidence level leads to a wider confidence intervalA higher confidence level leads to a wider confidence interval.. •• Greater confidence Greater confidence implies loss of precision implies loss of precision (i t i f(i t i f(i.e. greater margin of (i.e. greater margin of error).error). •• 95% confidence is95% confidence is•• 95% confidence is 95% confidence is most often used.most often used. Confidence Intervals for Example 8.2 8-23 C ha 8 4 Confidence Interval for a Mean (8 4 Confidence Interval for 8.4 Confidence Interval for a Mean (8.4 Confidence Interval --55 •• A confidence interval either A confidence interval either doesdoes or InterpretationInterpretation 8
  • 92. •• The confidence level quantifies the The confidence level quantifies the riskrisk.. •• Out of 100 confidence intervals, approximately 95% Out of 100 confidence intervals, approximately 95% maymay contain while approximately 5% while approximately 5% might constructing 95% confidence intervals.confidence intervals. When Can We Assume Normality?When Can We Assume Normality? use the p p , y formula to compute the confidence interval. normal, a common rule of th formula as long as the distribution Is approximately symmetric with no outliers. • Larger n may be needed to assume normality if you are sampling from a strongly• Larger n may be needed to assume normality if you are sampling from a strongly skewed population or one with outliers. 8-24
  • 93. C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for pter 8.5 Confidence Interval for a Mean (8.5 Confidence Interval --66 8 LO8LO8--6: 6: Know when to use Student’s Know when to use Student’s t t instead of instead of zz to estimate to estimate •• Use the Use the Student’s t distributionStudent’s t distribution instead of the normal distribution instead of the normal distribution Student’s t DistributionStudent’s t Distribution when the population is normal but the standard deviation when unknown and the sample size is small.unknown and the sample size is small. 8-25 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for with) with pter 8.5 Confidence Interval for a Mean (8.5 Confidence Interval
  • 94. --66 8 LO8LO8--6: 6: Know when to use Student’s Know when to use Student’s t t instead of instead of zz to estimate to estimate Student’s t DistributionStudent’s t Distribution 8-26 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval --66 Student’s t DistributionStudent’s t Distribution •• tt distributions are symmetric and shaped like the standard normaldistributions are symmetric and shaped like the standard normal 8 •• tt distributions are symmetric and shaped like the standard normal distributions are symmetric and shaped like the standard normal distribution.distribution.
  • 95. •• The The tt distribution is dependent on the size of the sample.distribution is dependent on the size of the sample.p pp p Comparison of Normal and St dent’sComparison of Normal and St dent’s tt 8-27Figure 8.11 Comparison of Normal and Student’s Comparison of Normal and Student’s tt C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval --66 Degrees of FreedomDegrees of Freedom •• Degrees of FreedomDegrees of Freedom ((d fd f ) is a parameter based on the sample) is a parameter based on the sample 8 •• Degrees of Freedom Degrees of Freedom ((d.fd.f.) is a parameter based on the sample .) is a parameter based on the sample size that is used to determine the value of the size that is used to determine the value of the tt statistic.statistic.
  • 96. •• Degrees of freedom tell how many observations are used to Degrees of freedom tell how many observations are used to g yg y calc estimates used in , less the number of intermediate estimates used in the calculation. The d.f for the the calculation. The d.f for the tt distribution in this case, is given distribution in this case, is given bb d fd f 11by by d.f.d.f. = = nn --1.1. •• As As nn increases, the increases, the tt distribution approaches the shape of the distribution approaches the shape of the l di t ib til di t ib tinormal distribution. normal distribution. •• For a given confidence level, For a given confidence level, tt is always larger than is always larger than zz, so a , so a confidence interval based onconfidence interval based on tt is always wider than ifis always wider than if zz were usedwere usedconfidence interval based on confidence interval based on tt is always wider than if is always wider than if zz were used.were used. 8-28 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval
  • 97. --66 Comparison of z and tComparison of z and t • For very small samples t-values differ substantially from the 8 • For very small samples, t-values differ substantially from the normal. • As degrees of freedom increase, the t-values approach the g , pp normal z-values. • For example, for n = 31, the degrees of freedom, d.f. = 31 – 1 = 30. So for a 90 percent confidence interval, we would use t = 1.697, which is only slightly larger than z = 1.645. 8-29 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for E l GMAT S A iE l GMAT S A i pter 8.5 Confidence Interval for a Mean (8.5 Confidence Interval
  • 98. --66 Example GMAT Scores AgainExample GMAT Scores Again 8 8-30 Figure 8.13 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval --66 Example GMAT Scores AgainExample GMAT Scores Again C t t 90% fid i t l f th GMAT fC t t 90% fid i t l f th GMAT f 8 •• Construct a 90% confidence interval for the mean GMAT score of Construct a 90% confidence interval for the mean GMAT score of all MBA applicants.all MBA applicants. x = 510 s = 73.77 •• Since Since use the Student’s tt for the confidence interval for the confidence interval with with d.f.d.f. = 20 = 20 –– 1 = 19.1 = 19.
  • 99. from Appendix D.from Appendix D. 8-31 C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval Unknown (Unknown --66 •• For a 90% confidence For a 90% confidence interval, use Appendixinterval, use Appendix 8 interval, use Appendix interval, use Appendix D to find tD to find t0.050.05 = 1.729 = 1.729 with with d.f.d.f. = 19.= 19. Note: One can use Excel, Minitab, etc. to obtain these values as well as to construct confidence Intervals. We are 90 percent confident that the true mean GMAT score might be within the 8-32
  • 100. g interval [481.48, 538.52] C ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval --66 Confidence Interval WidthConfidence Interval Width • Confidence interval width reflects 8 • Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. • To obtain a narrower interval and more precision i th l i- increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence). 8-33 C
  • 101. ha 8 5 Confidence Interval for a Mean (8 5 Confidence Interval for 8.5 Confidence Interval for a Mean (8.5 Confidence Interval ) with --66 Using Appendix DUsing Appendix D 8 •• Beyond Beyond d.f. d.f. = 50, Appendix D shows = 50, Appendix D shows d.f. d.f. in steps of 5 or 10.in steps of 5 or 10. •• If the table does not give the exact degrees of freedom, use the If the table does not give the exact degrees of freedom, use the g g ,g g , tt--value for the next lower degrees of freedom.value for the next lower degrees of freedom. •• This is a conservative procedure since it causes the interval to be This is a conservative procedure since it causes the interval to be slightly wider.slightly wider. • A conservative statistician may use the t distribution for confidence intervals when σ is unknown becauseconfidence intervals when σ is unknown because using z would underestimate the margin of error. 8-34
  • 102. C hapter 8.6 Confidence Interval for a Proportion (8.6 Confidence --77 8 LO8LO8--7: 7: Construct a 90, 95, or 99 percent confidence interval for Construct a 90, 95, or 99 percent confidence interval for π.π. •• A proportion is a mean of data whose only values are 0 or 1.A proportion is a mean of data whose only values are 0 or 1. 8-35 C hapter 8.6 Confidence Interval for a Proportion (8.6 Confidence --77 Applying the CLTApplying the CLT 8 •• The distribution of a sample proportion The distribution of a sample proportion pp = = xx//n n is symmetric if is symmetric if = .50 as , approaches symmetry as nn increases.increases. 8-36
  • 103. C hapter 8.6 Confidence Interval for a Proportion (8.6 Confidence --77 When is it Safe to Assume Normality of p?When is it Safe to Assume Normality of p? 8 •• Rule of Thumb: Rule of Thumb: The sample proportion The sample proportion pp = = xx//nn may be assumed to may be assumed to d nn(1(1-- Sample size to assume normality:y Table 8.9 8-37 C hapter 8.6 Confidence Interval for a Proportion (8.6 Confidence --77 8
  • 104. •• unknown, the confidence interval for pp = = xx//nn (assuming a large sample) is(assuming a large sample) is 8-38 C hapter 8.6 Confidence Interval for a Proportion (8.6 Confidence Interval --77 Example AuditingExample Auditing 8 8-39 C hapter 8.7 Estimating from Finite Population8.7 Estimating from Finite PopulationLO8LO8--88 8 LO8LO8--8: 8: Construct Confidence Intervals for Finite PopulationsConstruct Confidence Intervals for Finite Populations. N = population size; n = sample size 8-40
  • 105. C hapter 8.8 Sample Size determination for a Mean8.8 Sample Size determination for a MeanLO8LO8--99 8 LO8LO8--9: 9: Calculate sample size to estimate a mean or proportionCalculate sample size to estimate a mean or proportion. •• To estimate a population mean with a precision of To estimate a population mean with a precision of ++ E E (allowable (allowable error), you would need a sample of size. Now, error), you would need a sample of size. Now, 8-41 C hapter 8.8 Sample Size determination for a Mean8.8 Sample Size determination for a MeanLO8LO8--99 8
  • 106. •• Method 1: Method 1: Take a Preliminary SampleTake a Preliminary Sample Take a small preliminary sample and use the sample Take a small preliminary sample and use the sample ss in place of in place of the sample size formula.in the sample size formula. •• Method 2: Method 2: Assume Uniform PopulationAssume Uniform Population Estimate rough upper and lower limitsEstimate rough upper and lower limits aa andand bb and setand setEstimate rough upper and lower limits Estimate rough upper and lower limits aa and and bb and set and set --aa)/12])/12]½½. . •• Method 3: Method 3: Assume Normal PopulationAssume Normal Populatione od 3e od 3 ssu e o a opu a ossu e o a opu a o Estimate rough upper and lower limits Estimate rough upper and --aa)/4. )/4. This assumes normality with most of the data with This assumes normality with most of the data wit •• Method 4: Method 4: Poisson ArrivalsPoisson Arrivals 8-42
  • 107. C hapter 8.9 Sample Size determination for a Proportion8.9 Sample Size determination for a ProportionLO8LO8--99 •• To estimate a population proportion with a precision of To estimate a population proportion with a precision of ±± E E (allowable error), you would need a sample of size (allowable error), you would need a sample of size 8 error is a number between 0 and 1, the allowable error EE is is 8-43 also between 0 and 1. also between 0 and 1. C hapter 8.9 Sample Size determination for a Proportion8.9 Sample Size determination for a ProportionLO8LO8--99 8 .50 This conservative method ensures the desired precision
  • 108. HoweverThis conservative method ensures the desired precision HoweverThis conservative method ensures the desired precision. However, This conservative method ensures the desired precision. However, the sample may end up being larger than necessary.the sample may end up being larger than necessary. •• Method 2Method 2: : Take a Preliminary SampleTake a Preliminary Sample T k ll li i l d th lT k ll li i l d th l i l fi l fTake a small preliminary sample and use the sample Take a small preliminary in the sample size formula.in the sample size formula. •• Method 3Method 3:: Use a Prior Sample or Historical DataUse a Prior Sample or Historical DataMethod 3Method 3: : Use a Prior Sample or Historical DataUse a Prior Sample or Historical Data How often are such samples available? Unfortunately, How might be different enough to make it a questionable assumption. different enough to make it a questionable assumption. 8-44 8.10 Confidence Interval for a Population Variance (8.10 Confidence Interval for a Population Variance --1010 LO8LO8--10: 10: Construct a confidence interval for a variance (optional).Construct a confidence interval for a variance (optional).
  • 109. If th l ti i l th th l iIf th l ti i l th th l i 22 ChiChi--Square DistributionSquare Distribution •• If the population is normal, then the sample variance If the population is normal, then the sample variance ss22 follows the follows the chichi--square distributionsquare freedom freedom d.f.d.f. = = nn –– 1.1.eedoeedo dd tail percentiles for the chi) tail percentiles for the chi-- square distribution can be found using Appendix Esquare distribution can be found using Appendix E.. 8-45 8.10 Confidence Interval for a Population Variance (8.10 Confidence Interval for a Population Variance --1010 LO8LO8--10: 10: Construct a confidence interval for a variance (optional).Construct a confidence interval for a variance (optional). U i th l iU i th l i 22 th fid i t l ith fid i t l i Confidence IntervalConfidence Interval •• Using the sample variance Using the sample variance ss22, the confidence interval is, the confidence interval is •• To obtain a confidence interval for the standard deviation To obtain a confidence interval for the standard deviation
  • 110. the square root of the interval bounds. 8-46 8.10 Confidence Interval for a Population Variance (8.10 Confidence Interval for a Population Variance --1010 You can use Appendix E to find critical chi-square values. 8-47 8.10 Confidence Interval for a Population Variance (8.10 Confidence Interval for a Population Variance --1010 Caution: Assumption of NormalityCaution: Assumption of Normality •• The methods described for confidence interval estimation of the The methods described for confidence interval estimation of the variance and standard deviation depend on the population having a variance and standard deviation depend on the population having a normal distributionnormal distributionnormal distribution.normal distribution. •• If the population does not have a normal distribution, then the If the population does not have a normal distribution, then the
  • 111. confidence interval should not be considered accurateconfidence interval should not be considered accurateconfidence interval should not be considered accurate.confidence interval should not be considered accurate. 8-48 CC C ha Continuous Probability DistributionsContinuous Probability Distributions pter Chapter ContentsChapter Contents 7 7 1 Describing a Continuous Distribution7 1 Describing a Continuous Distribution7.1 Describing a Continuous Distribution7.1 Describing a Continuous Distribution 7.2 Uniform Continuous Distribution 7.2 Uniform Continuous Distribution 7 3 N l Di t ib ti7 3 N l Di t ib ti7.3 Normal Distribution7.3 Normal Distribution 7.4 Standard Normal Distribution7.4 Standard Normal Distribution 7.5 Normal Approximations7.5 Normal Approximations 7.6 Exponential Distribution7.6 Exponential Distributionpp 7.7 Triangular Distribution (Optional)7.7 Triangular Distribution (Optional)
  • 112. 7-1 C ha CC pter Continuous Probability DistributionsContinuous Probability Distributions Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 7 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO7LO7 11LO7LO7--11: : Define a continuous random variable.Define a continuous random variable. LO7LO7--2: 2: Calculate uniform probabilities.Calculate uniform probabilities. LO7LO7--3: 3: Know the form and parameters of the normal distribution.Know the form and parameters of the normal distribution. LO7LO7--4:4: Find the normal probability for given z or x using tables or Excel.Find the normal probability for given z or x using tables or Excel.LO7LO7 4:4: Find the normal probability for given z or x using tables or Excel.Find the normal probability for given z or x using tables or Excel. LO7LO7--5:5: Solve for z or x for a given normal probability using tables or Excel.Solve for z or x for a given normal
  • 113. probability using tables or Excel. 7-2 C ha CC pter Continuous Probability DistributionsContinuous Probability Distributions Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s) 7 Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s) LO6LO6LO6:LO6: Use the normal approximation to a binomial or a PoissonUse the normal approximation to a binomial or a Poisson distribution.distribution. LO7:LO7: Find the exponential probability for a given xFind the exponential probability for a given x.. LO8: LO8: Solve for x for given exponential probability.Solve for x for given exponential probability. LO9:LO9: Use the triangular distribution for “whatUse the triangular distribution for “what--if” analysis (optional).if” analysis (optional).
  • 114. 7-3 C ha 7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti pter 7.1 Describing a Continuous Distribution7.1 Describing a Continuous DistributionLO7LO7--11 7 LO7LO7--1: 1: Define a continuous random variable.Define a continuous random variable. Di t V i blDi t V i bl h l fh l f XX h it b bilith it b bilit PP((XX)) Events as IntervalsEvents as Intervals •• Discrete VariableDiscrete Variable –– each value of each value of XX has its own probability has its own probability PP((XX).). •• Continuous VariableContinuous Variable –– events are events are intervalsintervals and probabilities are and probabilities are areas under continuous curves A single point has no probabilityareas under continuous curves A single point has no probabilityareas under continuous curves. A single point has no probability.areas under continuous curves. A single point has no probability. 7-4
  • 115. C ha 7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti pter 7.1 Describing a Continuous Distribution7.1 Describing a Continuous DistributionLO7LO7--11 Continuous PDF’s: PDF PDF –– Probability Density FunctionProbability Density Function 7 Continuous PDF’s: • Denoted f(x) • Must be nonnegative• Must be nonnegative • Total area under curve = 1 • Mean, variance and shape depend on the PDF parametersthe PDF parameters • Reveals the shape of the distribution 7-5
  • 116. C ha 7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti pter 7.1 Describing a Continuous Distribution7.1 Describing a Continuous DistributionLO7LO7--11 CDF CDF –– Cumulative Distribution FunctionCumulative Distribution Function 7 Continuous CDF’s: • Denoted F(x) • Shows P(X ≤ x), the cumulative proportion fof scores • Useful for finding probabilitiesprobabilities 7-6 C ha 7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti pter 7.1 Describing a Continuous Distribution7.1 Describing a Continuous DistributionLO7LO7--11
  • 117. Probabilities as AreasProbabilities as Areas 7 Continuous probability functions:Continuous probability functions: •• Unlike discrete Unlike discrete distributions, the distributions, the probability at any probability at any single point = 0single point = 0single point = 0.single point = 0. •• The entire area under The entire area under any PDF, by definition, any PDF, by definition, y , y ,y , y , is set to 1.is set to 1. •• Mean is the balanceMean is the balance point of the distribution.point of the distribution. 7-7 C ha 7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti pter 7.1 Describing a Continuous Distribution7.1 Describing a Continuous DistributionLO7LO7--11 Expected Value and VarianceExpected Value and Variance 7
  • 118. The mean and variance of a continuous random variable are analogous to E(X) and Var(X ) for a discrete random variable, Here the integral sign replaces the summation sign. Calculus is required to compute the integrals. p g q p g 7-8 C ha 7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti pter 7.2 Uniform Continuous Distribution7.2 Uniform Continuous DistributionLO7LO7--22 7 LO7LO7--2: 2: Calculate uniform probabilities.Calculate uniform probabilities. Characteristics of the Uniform Characteristics of the Uniform DistributionDistribution If If XX is a random variable that is is a random variable that is uniformly distributed between uniformly distributed between aa and and bb, its PDF has , its PDF has constant height.constant height. • Denoted U(a, b) • Area =
  • 119. base x height =base x height (b-a) x 1/(b-a) = 1 7-9 C ha 7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti pter 7.2 Uniform Continuous Distribution7.2 Uniform Continuous DistributionLO7LO7--22 Characteristics of the Uniform DistributionCharacteristics of the Uniform Distribution 7 7-10 C ha 7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti pter 7.2 Uniform Continuous Distribution7.2 Uniform Continuous DistributionLO7LO7--22 Example: Anesthesia EffectivenessExample: Anesthesia Effectiveness
  • 120. •• An oral surgeon injects a painkiller prior to extracting a tooth Given theAn oral surgeon injects a painkiller prior to extracting a tooth Given the 7 An oral surgeon injects a painkiller prior to extracting a tooth. Given the An oral surgeon injects a painkiller prior to extracting a tooth. Given the varying characteristics of patients, the dentist views the time for varying characteristics of patients, the dentist views the time for anesthesia effectiveness as a uniform random variable that takes anesthesia effectiveness as a uniform random variable that takes between 15 minutes and 30 minutesbetween 15 minutes and 30 minutesbetween 15 minutes and 30 minutes.between 15 minutes and 30 minutes. •• XX is is UU(15, 30)(15, 30) •• aa = 15,= 15, bb = 30, find the mean and standard deviation.= 30, find the mean and standard deviation.aa 15, 15, bb 30, find the mean and standard deviation. 30, find the mean and standard deviation. •• Find the probability that the effectiveness anesthetic takes between Find the probability that the effectiveness anesthetic takes between 20 and 25 minutes.20 and 25 minutes. 7-11 20 and 25 minutes.20 and 25 minutes. C ha
  • 121. 7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti pter 7.2 Uniform Continuous Distribution7.2 Uniform Continuous DistributionLO7LO7--22 7 Example: Anesthesia EffectivenessExample: Anesthesia Effectiveness PP(20 < (20 < XX < 25) = (25 < 25) = (25 –– 20)/(30 20)/(30 –– 15) = 5/15 = 0.3333 = 33.33% 15) = 5/15 = 0.3333 = 33.33% 7-12 C ha 7 3 N l Di t ib ti7 3 N l Di t ib ti pter 7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33 7 LO7LO7--3: 3: Know the form and parameters of the normal distribution.Know the form and parameters of the normal distribution. Characteristics of the Normal DistributionCharacteristics of the Normal Distribution • Normal or Gaussian (or bell shaped) distribution was named for German
  • 122. mathematician Karl Gauss (1777 – 1855). • Domain is – • Almost all (99.7%) of the area under the normal curve is included in the ( ) range µ – • Symmetric and unimodal about the mean. 7-13 C ha 7 3 N l Di t ib ti7 3 N l Di t ib ti pter 7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33 Characteristics of the Normal DistributionCharacteristics of the Normal Distribution 7 7-14 C ha 7 3 N l Di t ib ti7 3 N l Di t ib ti
  • 123. pter 7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33 7 Characteristics of the Normal DistributionCharacteristics of the Normal Distribution •• Normal PDF Normal PDF ff((xx) reaches a maximum at ) reaches a maximum at µµ and has points of inflection at and has points of inflection at Bell-shaped curve NOTE:NOTE: All normal All normal distributionsdistributionsdistributions distributions have the same have the same shape but differshape but differshape but differ shape but differ in the axis scales.in the axis scales. 7-15 C ha 7 3 N l Di t ib ti7 3 N l Di t ib ti pter 7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33 7 Characteristics of the Normal DistributionCharacteristics of the Normal Distribution
  • 124. •• Normal CDF Normal CDF 7-16 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--33 Characteristics of the Standard Normal DistributionCharacteristics of the Standard Normal Distribution 7 rent normal distribution, we transform a normal y µ , , random variable to a standard normal distribution with µ = 0 7-17 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter
  • 125. LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard Normal Distribution Characteristics of the Standard NormalCharacteristics of the Standard Normal St d d l PDF f( ) h i t 0 d h 7 • Standard normal PDF f(x) reaches a maximum at z = 0 and has points of inflection at +1. •• Shape is unaffected by Shape is unaffected by the transformationthe transformationthe transformation. the transformation. It is still a bellIt is still a bell--shaped shaped curve.curve. Figure 7 11 7-18 Figure 7.11 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard Normal Distribution Characteristics of the Standard NormalCharacteristics of the
  • 126. Standard Normal •• Standard normal CDFStandard normal CDF 7 A common scale•• Standard normal CDFStandard normal CDF • A common scale from -3 to +3 is used. • Entire area under the curve is unity. • The probability of an event P(z < Z < z )event P(z1 < Z < z2) is a definite integral of f(z). • However, standard normal tables or Excel functions can be used to find the desired probabilities. 7-19 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard Normal Distribution
  • 127. Normal Areas from Appendix CNormal Areas from Appendix C- -11 • Appendix C-1 allows you to find the area under the curve 7 • Appendix C-1 allows you to find the area under the curve from 0 to z. • For example, find P(0 < Z < 1.96):p , ( ) 7-20 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard Normal Distribution Normal Areas from Appendix CNormal Areas from Appendix C- -11 •• Now findNow find PP((--1 96 <1 96 < ZZ < 1 96)< 1 96) 7 •• Now find Now find PP((--1.96 < 1.96 < ZZ < 1.96).< 1.96). •• Due to symmetry, Due to symmetry, PP((--1.96 < 1.96 < ZZ) is the same as ) is the same as PP((ZZ < 1.96).< 1.96). • So, P(-1.96 < Z < 1.96) = .4750 + .4750 = .9500 or 95% of the area under the curve.
  • 128. 7-21 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard Normal Distribution Basis for the Empirical RuleBasis for the Empirical Rule 7 • Approximately 68% of the area under the curve is between + • Approximately 95% of the area under the curve is between + • Approximately 99.7% of the area under the curve is between + 7-22 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal
  • 129. DistributionLO7LO7--44 N l A f A di CN l A f A di C 22 7 LO7LO7--4: 4: Find the normal probability for given z or x using tables or Excel.Find the normal probability for given z or x using tables or Excel. Normal Areas from Appendix CNormal Areas from Appendix C- -22 •• Appendix CAppendix C--2 allows you to find the area under the curve from the left of 2 allows you to find the area under the curve from the left of zz (similar to Excel)(similar to Excel)z z (similar to Excel).(similar to Excel). •• For example, For example, PP((ZZ < < --1.96)1.96)PP((ZZ < 1.96< 1.96) PP((--1.96 < 1.96 < ZZ < 1.96)< 1.96) 7-23 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--44
  • 130. Normal Areas from Appendices CNormal Areas from Appendices C--1 or C1 or C--22 •• Appendices CAppendices C--1 and C1 and C--2 yield identical results2 yield identical results 7 •• Appendices CAppendices C--1 and C1 and C--2 yield identical results.2 yield identical results. •• Use whichever table is easiest.Use whichever table is easiest. Finding Finding zz for a Given Areafor a Given Area • Appendices C-1 and C-2 can be used to find the z-value corresponding to a given probability. For e ample hat al e defines the top 1% of a normal• For example, what z-value defines the top 1% of a normal distribution? • This implies that 49% of the area lies between 0 and z whichThis implies that 49% of the area lies between 0 and z which gives z = 2.33 by looking for an area of 0.4900 in Appendix C- 1. 7-24 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter
  • 131. 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--44 Finding Areas by using Standardized VariablesFinding Areas by using Standardized Variables 7 •• Suppose John took an economics exam and scored 86 points The classSuppose John took an economics exam and scored 86 points The classSuppose John took an economics exam and scored 86 points. The class Suppose John took an economics exam and scored 86 points. The class mean was 75 with a standard deviation of 7. What percentile is John in? mean was 75 with a standard deviation of 7. What percentile is John in? That is, what is That is, what is PP((XX < 86) where X represents the exam scores?< 86) where X represents the exam scores? •• So John’s score is 1.57 standard deviations about the mean. So John’s score is 1.57 standard deviations about the mean. •• PP((XX < 86) = < 86) = PP((ZZ < 1.57) = .9418 (from Appendix C< 1.57) = .9418 (from Appendix C--2)2) •• So John is approximately in the 94So John is approximately in the 94thth percentilepercentile•• So, John is approximately in the 94So, John is approximately in the 94thth percentilepercentile.. 7-25 C
  • 132. ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--44 •• Finding Areas by using Standardized VariablesFinding Areas by using Standardized Variables 7 NOTE: You can use Excel, Minitab, TI83/84 etc. to compute these probabilities directly. 7-26 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--55 7 LO7LO7--5: 5: Solve for z or x for a normal probability using tables or Excel.Solve for z or x for a normal probability using tables or Excel. •• Inverse NormalInverse Normal
  • 133. • How can we find the various normal percentiles (5th 10th 25th 75th• How can we find the various normal percentiles (5th, 10th, 25th, 75th, 90th, 95th, etc.) known as the inverse normal? That is, how can we find X for a given area? We simply turn the standardizing transformation around: = 7-27 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--55 7 •• Inverse NormalInverse Normal • For example, suppose that John’s economics professor has decided that any student who scores below the 10th percentile must retake the exam. • The exam scores are normal with μ = 75 and σ = 7. • What is the score that would require a student to retake the
  • 134. exam? • We need to find the value of x that satisfies P(X < x) = 10We need to find the value of x that satisfies P(X < x) .10. • The z-score for with the 10th percentile is z = −1.28. 7-28 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--55 Inverse NormalInverse Normal 7 •• Inverse NormalInverse Normal • The steps to solve the problem are:The steps to solve the problem are: • Use Appendix C or Excel to find z = −1.28 to satisfy P(Z < −1.28) = .10. • Substitute the given information into z = (x μ)/σ to get• Substitute the given information into z = (x − μ)/σ to get −1.28 = (x − 75)/7 • Solve for x to get x = 75 − (1.28)(7) = 66.03 (or 66 after rounding) S• Students who score below 66 points on the economics exam
  • 135. will be required to retake the exam. 7-29 C ha 7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti pter 7.4 Standard Normal Distribution7.4 Standard Normal DistributionLO7LO7--55 •• Inverse NormalInverse Normal 7 7-30 C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 7 LO7LO7--6: 6: Use the normal approximation to a binomial or a Poisson.Use the normal approximation to a binomial or a
  • 136. Poisson. Normal Approximation to the BinomialNormal Approximation to the Binomial Bi i l b biliti diffi lt t l l t hBi i l b biliti diffi lt t l l t h i li l•• Binomial probabilities are difficult to calculate when Binomial probabilities are difficult to calculate when nn is large.is large. •• Use a normal approximation to the binomial distribution.Use a normal approximation to the binomial distribution. •• AsAs nn becomes large, the binomial bars become smaller and continuity isbecomes large, the binomial bars become smaller and continuity isAs As nn becomes large, the binomial bars become smaller and continuity is becomes large, the binomial bars become smaller and continuity is approached.approached. 7-31 C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 Normal Approximation to the BinomialNormal Approximation to the Binomial 7
  • 137. - appropriate to use the normal approximation to the binomial distribution. • In this case the mean and standard deviation for the binomial distribution• In this case, the mean and standard deviation for the binomial distribution Example Coin FlipsExample Coin Flips If t fli i 32 ti d 50 th• If we were to flip a coin n = 32 times and requirements for a normal approximation to the binomial distribution met? 7-32 C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 Example Coin FlipsExample Coin Flips 7 n(1- - .50) = 16
  • 138. • So a normal approximation can be usedSo, a normal approximation can be used. • When translating a discrete scale into a continuous scale, care must be taken about individual points. • For example, find the probability of more than 17 heads in 32 flips of a fair coin. • However, “more than 17” actually falls between 17 and 18 on a discrete scale. 7-33 C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 Example Coin FlipsExample Coin Flips •• Since the cutoff point for “more than 17” is halfway between 17 and 18 weSince the cutoff point for “more than 17” is halfway between 17 and 18 we 7 Since the cutoff point for more than 17 is halfway between 17 and 18, we Since the cutoff point for more than 17 is halfway
  • 139. between 17 and 18, we add 0.5 to the lower limit and find add 0.5 to the lower limit and find PP((XX > 17.5).> 17.5). •• This addition to This addition to XX is called the is called the Continuity CorrectionContinuity Correction.. •• At this point, the problem can be completed as any normal distribution At this point, the problem can be completed as any normal distribution problem.problem. 7-34 C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 7 Example Coin FlipsExample Coin Flips P(X > 17) P(X ≥ 18) P(X ≥ 17 5)P(X > 17.5) = P(Z > 0.53) = 0.2981 7-35
  • 140. C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 Normal Approximation to the PoissonNormal Approximation to the Poisson • The normal approximation to the Poisson distribution works best 7 • The normal approximation to the Poisson distribution works best B). deviation µ q for the Poisson distribution. Example Utility BillsExample Utility Billsp yp y • On Wednesday between 10A.M. and noon customer billing inquiries arrive at a mean rate of 42 inquiries per hour at Consumers Energy. What is the probability of receiving more than 50 calls in an hour? 7-36
  • 141. C ha 7 5 N l A i ti7 5 N l A i ti pter 7.5 Normal Approximations7.5 Normal ApproximationsLO7LO7--66 Example Utility BillsExample Utility Bills 7 •• To find To find PP((XX > 50) calls, use the continuity> 50) calls, use the continuity--corrected cutoff point halfway corrected cutoff point halfway between 50 and 51 (i.e., between 50 and 51 (i.e., XX = 50.5).= 50.5). •• At this point the problem can be completed as any normal distributionAt this point the problem can be completed as any normal distributionAt this point, the problem can be completed as any normal distribution At this point, the problem can be completed as any normal distribution problem.problem. 7-37 C
  • 142. hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--77 7 LO7LO7--7: 7: Find the exponential probability for a given xFind the exponential probability for a given x.. Characteristics of the Exponential DistributionCharacteristics of the Exponential Distribution If t it f ti f ll P i di t ib ti th ti til thIf t it f ti f ll P i di t ib ti th ti til th•• If events per unit of time follow a Poisson distribution, the time until the If events per unit of time follow a Poisson distribution, the time until the next event follows the next event follows the Exponential distribution.Exponential distribution. •• The time until the next event is a continuous variable.The time until the next event is a continuous variable. NOTE HereNOTE HereNOTE: Here NOTE: Here we will findwe will find probabilitiesprobabilities > x or ≤ x.> x or ≤ x. 7-38 C hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--77
  • 143. Characteristics of the Exponential DistributionCharacteristics of the Exponential Distribution 7 Probability of waiting more than xProbability of waiting less than or equal to x 7-39 equa to C hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--77 Example Customer Waiting TimeExample Customer Waiting Time 7 • Between 2P.M. and 4P.M. on Wednesday, patient insurance inquiries arrive at Blue Choice insurance at a mean rate of 2.2 calls iper minute. • What is the probability of waiting more than 30 seconds (i.e., 0.50 minutes) for the next call?minutes) for the next call?
  • 144. • P(X > 0 50) = e– –(2.2)(0.5) = 3329P(X > 0.50) = e = e ( )( ) = .3329 or 33.29% chance of waiting more than 30 seconds for the next call. 7-40 C hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--77 Example Customer Waiting TimeExample Customer Waiting Time 7 P(X > 0.50) P(X ≤ 0.50) 7-41 C hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--88 7 LO7LO7--8: 8: Solve for Solve for x for given x for given
  • 145. exponential probability.exponential probability. Inverse ExponentialInverse Exponential If th i l t i 2 2 ll i t t th 90If th i l t i 2 2 ll i t t th 90thth•• If the mean arrival rate is 2.2 calls per minute, we want the 90If the mean arrival rate is 2.2 calls per minute, we want the 90thth percentile for waiting time (the top 10% of waiting time).percentile for waiting time (the top 10% of waiting time). •• Find theFind the xx--valuevalueFind the Find the xx--value value that defines the that defines the upper 10%.upper 10%. 7-42 C hapter 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--88 Inverse ExponentialInverse Exponential 7 7-43 C hapter
  • 146. 7.6 Exponential Distribution7.6 Exponential DistributionLO7LO7--88 7 Mean Time Between EventsMean Time Between Events 7-44 C hapter 7.7 Triangular Distribution7.7 Triangular DistributionLO7LO7--99 Ch t i ti f th T i l Di t ib tiCh t i ti f th T i l Di t ib ti 7 LO7LO7--9: 9: Use the triangular distribution for “whatUse the triangular distribution for “what--if” analysis (optional).if” analysis (optional). Characteristics of the Triangular DistributionCharacteristics of the Triangular Distribution 7-45 C hapter 7.7 Triangular Distribution7.7 Triangular DistributionLO7LO7--99
  • 147. Characteristics of the Triangular DistributionCharacteristics of the Triangular Distribution 7 The triang lar distrib tion is a a of thinking abo t ariation that• The triangular distribution is a way of thinking about variation that corresponds rather well to what-if analysis in business. • It is not surprising that business analysts are attracted to the triangular model. • Its finite range and simple form are more understandable than a normal distribution. 7-46 C hapter 7.7 Triangular Distribution7.7 Triangular DistributionLO7LO7--99 Characteristics of the Triangular DistributionCharacteristics of the Triangular Distribution 7 • It is more versatile than a normal, because it can be skewed in either
  • 148. direction. Y t it h f th i ti f l h di ti t d• Yet it has some of the nice properties of a normal, such as a distinct mode. • The triangular model is especially handy for what-if analysis when the business case depends on predicting a stochastic variable (e.g., the price of a raw material, an interest rate, a sales volume). • If the analyst can anticipate the range (a to c) and most likely value (b), it will be possible to calculate probabilities of various outcomes. p p • Many times, such distributions will be skewed, so a normal wouldn’t be much help. 7-47 屏幕快照 2014-03-17 下午1.37.25.png __MACOSX/._屏幕快照 2014-03-17 下午1.37.25.png 屏幕快照 2014-03-17 下午1.37.28.png __MACOSX/._屏幕快照 2014-03-17 下午1.37.28.png 屏幕快照 2014-03-17 下午1.38.04.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.04.png 屏幕快照 2014-03-17 下午1.38.09.png
  • 149. __MACOSX/._屏幕快照 2014-03-17 下午1.38.09.png 屏幕快照 2014-03-17 下午1.38.13.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.13.png 屏幕快照 2014-03-17 下午1.38.17.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.17.png 屏幕快照 2014-03-17 下午1.38.21.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.21.png 屏幕快照 2014-03-17 下午1.38.28.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.28.png 屏幕快照 2014-03-17 下午1.38.32.png __MACOSX/._屏幕快照 2014-03-17 下午1.38.32.png