SSChaSimple RegressionSimple Regressionpter C.docx

SS
C
ha
Simple RegressionSimple Regression
pter
Chapter ContentsChapter Contents
12
12.1 Visual Displays and Correlation Analysis12.1 Visual
Displays and Correlation Analysisp y yp y y
12.2 Simple Regression12.2 Simple Regression
12 3 Regression Terminology12 3 Regression Terminology12.3
Regression Terminology12.3 Regression Terminology
12.4 Ordinary Least Squares Formulas12.4 Ordinary Least
Squares Formulas
12 T f Si ifi12 T f Si ifi12.5 Tests for Significance12.5 Tests
for Significance
12.6 Analysis of Variance: Overall Fit12.6 Analysis of
Variance: Overall Fit
12.7 Confidence and Prediction Intervals for 12.7 Confidence
and Prediction Intervals for YY
12-1
SS
C

ha
pter
12
12 8 Residual Tests12 8 Residual Tests12.8 Residual Tests12.8
Residual Tests
12.9 Unusual Observations12.9 Unusual Observations
12 10 Oth R i P bl12 10 Oth R i P bl12.10 Other Regression
Problems12.10 Other Regression Problems
12-2
C
ha
SS
pter 1
Chapter Learning Objectives (LO’s)Chapter Learning
Objectives (LO’s)
12
Chapter Learning Objectives (LO s)Chapter Learning Objectives
(LO s)

LO12LO12--1: 1: Calculate and test a correlation Calculate and
test a correlation coefficient coefficient for for
significancesignificance..
LO12LO12--2: 2: Interpret Interpret the slope and intercept of a
regression equation.the slope and intercept of a regression
equation.
LO12LO12--3: 3: Make Make a prediction for a given a
prediction for a given x value using a x value using a
regressionregression
equationequation..qq
LO12LO12--4: 4: Fit a simple regression on an Excel scatter
plot.Fit a simple regression on an Excel scatter plot.
LO12LO12--5:5: Calculate and interpretCalculate and interpret
confidenceconfidence intervals forintervals for
regressionregressionLO12LO12 5: 5: Calculate and interpret
Calculate and interpret confidence confidence intervals for
intervals for regressionregression
coefficientscoefficients..
LO12LO12 6:6: Test hypotheses about the slope and intercept
by usingTest hypotheses about the slope and intercept by using t
testst tests
12-3
LO12LO12--6: 6: Test hypotheses about the slope and intercept
by using Test hypotheses about the slope and intercept by using
t tests.t tests.
C
ha

ff
pter
Analysis of VarianceAnalysis of Variance
Ch t L i Obj ti (LO’ )Ch t L i Obj ti (LO’ )
12
Objectives (LO’s)
LO12LO12--7:7: Perform regression with Excel or other
software.Perform regression with Excel or other software.
LO12LO12--8:8: Interpret the standard errorInterpret the
standard error RR22 ANOVA table and F testANOVA table and
F testLO12LO12 8: 8: Interpret the standard error, Interpret
the standard error, RR , ANOVA table, and F test., ANOVA
table, and F test.
LO12LO12--9:9: Distinguish between confidence and prediction
intervals.Distinguish between confidence and prediction
intervals.
LO12LO12 1010 T t id l f i l ti f i tiT t id l f i l ti f i
tiLO12LO12--10:10: Test residuals for violations of regression
assumptions.Test residuals for violations of regression
assumptions.
LO12LO12--11:11: Identify unusual residuals and highIdentify
unusual residuals and high--leverage observations.leverage
observations.
12-4
12 1 Visual12 1 Visual Displays andDisplays and

C
ha
12.1 Visual 12.1 Visual Displays and Displays and
Correlation AnalysisCorrelation Analysis
pter 1
Visual DisplaysVisual Displays
12
•• Begin the analysis of Begin the analysis of bivariate
databivariate data (i.e., two variables) with a (i.e., two
variables) with a
scatter plotscatter plot..
A tt l tA tt l t•• A scatter plot A scatter plot
-- displays each observed data pair (displays each observed data
pair (xxii, , yyii) as a dot on an ) as a dot on an X/YX/Y
grid.grid.
-- indicates visually the strength of the relationship between
theindicates visually the strength of the relationship between
theindicates visually the strength of the relationship between
the indicates visually the strength of the relationship between
the
two variables.two variables.
Sample Scatter Plot
12-5
C
ha

12 1 Visual12 1 Visual Displays andDisplays and pter 1
LO12LO12--11
Correlation AnalysisCorrelation Analysis 12
LO12LO12--1: 1: Calculate and test a correlation coefficient for
significance.Calculate and test a correlation coefficient for
significance.
Correlation CoefficientCorrelation Coefficient
•• The sample correlation coefficient (r) measures the•• The
sample correlation coefficient (r) measures the
degree of linearity in the relationship between X and Y.
-1 ≤ r ≤ +1
r = 0 indicates no linear
relationship
12-6
C
ha
Correlation AnalysisCorrelation AnalysisLO12LO12--11
Scatter Plots Showing Various Correlation ValuesScatter Plots
Showing Various Correlation Values
12

Strong Positive Correlation Weak Positive Correlation Weak
Negative Correlation
12-7Strong Negative Correlation No Correlation Nonlinear
Relation
C
ha
LO12LO12--11
correlation = 0)= 0 (population correlation = 0)
12
•• Step 1:Step 1: State the HypothesesState the Hypotheses
Determine whether you are using a one or twoDetermine
whether you are using a one or two--tailed test and the tailed
test and the
•• Step 2:Step 2: Specify the Decision RuleSpecify the Decision
Rule
For degrees of freedom For degrees of freedom df df = = nn --2,

Appendi DAppendi DAppendix D.Appendix D.
•• Note: r is an estimate of the population
12-8
C
ha
LO12LO12--11
Steps in Testi
correlation = 0)= 0 (population correlation = 0)
12
•• Step 3:Step 3: Calculate the Test StatisticCalculate the Test
Statistic
•• Step 4: Step 4: Make the DecisionMake the Decision
If the sample correlation coefficientIf the sample correlation
coefficient rr exceeds the critical valueexceeds the critical value
rr ,,If the sample correlation coefficient If the sample
correlation coefficient rr exceeds the critical value exceeds the
then reject then reject HH00..
If using the If using the tt statistic method, reject statistic
--

12-9
C
ha
LO12LO12--11
Critical Value for Correlation Coefficient (Critical Value for
Correlation Coefficient (Tests for Significance)Tests for
Significance)
12
•• Equivalently, you can calculate the critical value for the
correlation Equivalently, you can calculate the critical value for
the correlation
coefficient usingcoefficient using
•• This method gives a benchmark for the correlation
coefficient.This method gives a benchmark for the correlation
coefficient.gg
•• However, there is no However, there is no pp--value and is
inflexible if you change your value and is inflexible if you
change your
• MegaStat uses this method, giving two-tail critical values for

12-10
C
ha
LO12LO12--11
Correlation AnalysisCorrelation Analysis 12
12-11
C
ha
12 2 Si l R i12 2 Si l R i
pter 1
What is Simple Regression?What is Simple Regression?
12
• Simple Regression analyzes the relationship between two
variables.
It ifi d d t ( ) i bl d• It specifies one dependent (response)
variable and one
independent (predictor) variable.

• This hypothesized relationship here will be linear• This
hypothesized relationship here will be linear.
12-12
C
ha
12 2 Si l R i12 2 Si l R i pter 1
12.2 Simple Regression12.2 Simple RegressionLO12LO12--22
LO12LO12 2:2: Interpret the slope and intercept of a regression
equationInterpret the slope and intercept of a regression
equation
Interpreting an Estimated Regression Equation:
ExamplesInterpreting an Estimated Regression Equation:
Examples
12LO12LO12--2: 2: Interpret the slope and intercept of a
regression equation.Interpret the slope and intercept of a
regression equation.
12-13
C
ha
12 2 Si l R i12 2 Si l R i pter 1
12.2 Simple Regression12.2 Simple RegressionLO12LO12--33

LO12LO12 33
Prediction Using Regression: ExamplesPrediction Using
Regression: Examples
12LO12LO12--3: 3: Make a prediction for a given Make a
prediction for a given x value using a x value using a regression
equation.regression equation.
g g pg g p
12-14
C
ha
12 2 Si l R i12 2 Si l R i
pter 1
NOTES:NOTES:
12
12-15
C
ha
12 3 R i T i l12 3 R i T i l
M d l d P tM d l d P t

pter 1
12.3 Regression Terminology12.3 Regression Terminology
Model and ParametersModel and Parameters
12
• The assumed model for a linear relationship is
• The relationship holds for all pairs (xi , yi ).
independently
normally distributed with mean of 0 and standard deviation
• The unknown parameters are:
12-16
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
12.3 Regression Terminology12.3 Regression Terminology
Model and ParametersModel and Parameters

12
•• The The fitted model fitted model oror regression model
regression model is used to predict the is used to predict the
expectedexpected value of value of YY for a given value of for
a given value of XX isis
•• TheThe fitted coefficientsfitted coefficients areareThe The
fitted coefficientsfitted coefficients areare
b0 the estimated intercept
b1 the estimated slope
12-17
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
LO12LO12--44 12.3 Regression Terminology12.3 Regression
Terminology
12
LO12LO12--4: 4: Fit a simple regression on an Excel scatter
plot.Fit a simple regression on an Excel scatter plot.
A more precise method is to let Excel
calculate the estimates. We enter
observations on the independent
variable x1, x2, . . ., xn and the
dependent variable y1, y2, . . ., yn into
separate columns and let Excel fi t theseparate columns, and let
Excel fi t the

regression equation, as illustrated in
Figure 12.6. Excel will choose the
regression coefficients so as to
produce a good fi t
12-18
C
ha
12 3 R i T i l12 3 R i T i l
pter 1
LO12LO12--44 12.3 Regression Terminology12.3 Regression
Terminology
12
Slope and Intercept InterpretationsSlope and Intercept
Interpretations
• Figure 12 6 (previous slide) shows a sample of miles per
gallon and• Figure 12.6 (previous slide) shows a sample of
miles per gallon and
horsepower for 15 engines. The Excel graph and its fitted
regression
equation are also shown.
• Slope Interpretation: The slope of -0.0785 says that for each
additional
unit of engine horsepower, the miles per gallon decreases by
0.0785 mile.
This estimated slope is a statistic because a different sample
might yield aThis estimated slope is a statistic because a
different sample might yield a

different estimate of the slope.
• Intercept Interpretation: The intercept value of 49.216
suggests that when
the engine has no horsepower , the fuel efficiency would be
quite high.the engine has no horsepower , the fuel efficiency
would be quite high.
However, the intercept has little meaning in this case, not only
because zero
horsepower makes no logical sense, but also because
extrapolating to x = 0
is beyond the range of the observed data.y g
12-19
C
ha
12 4 Ordinary Least Squares (OLS)12 4 Ordinary Least Squares
(OLS) pter 1
12.4 Ordinary Least Squares (OLS) 12.4 Ordinary Least
Squares (OLS)
FormulasFormulas
Slope and InterceptSlope and Intercept
12
•• The The ordinary least squaresordinary least squares method
(method (OLSOLS) estimates the slope ) estimates the slope
d i t t f th i li th t th f id l id i t t f th i li th t th f id l iand
intercept of the regression line so that the sum of residuals is
and intercept of the regression line so that the sum of residuals

is
minimized.minimized.
•• The sum of the residuals = 0The sum of the residuals = 0••
The sum of the residuals = 0.The sum of the residuals = 0.
•• The sum of the squared residuals is The sum of the squared
residuals is SSE.SSE.
12-20
C
ha
(OLS) pter 1
Squares (OLS)
FormulasFormulas
ThTh OLSOLS ti t f th l iti t f th l i
12
•• The The OLSOLS estimator for the slope is:estimator for the
slope is:
oror
ThTh OLSOLS ti t f th i t t iti t f th i t t i•• The The OLSOLS
estimator for the intercept is:estimator for the intercept is:

12-21
C
ha
(OLS) pter 1
Squares (OLS)
FormulasFormulas
12
12-22
C
ha
(OLS) pter 1
Squares (OLS)
FormulasFormulas
Assessing FitAssessing Fit
12
•• We want to explain the total variation in We want to explain

the total variation in YY around its mean (around its mean
(SSTSST for for
Total Sums of SquaresTotal Sums of Squares).).
•• The regression sum of squares (The regression sum of squares
(SSRSSR) is the ) is the explained variation explained variation
in in Y.Y.
12-23
C
ha
(OLS) pter 1
Squares (OLS)
FormulasFormulas
Th f (Th f (SSESSE) i th) i th l i d i til i d i ti ii YY
Assessing FitAssessing Fit
12
•• The error sum of squares (The error sum of squares (SSESSE)
is the ) is the unexplained variationunexplained variation in in
Y.Y.
•• If the fit is good, If the fit is good, SSESSE will be relatively
small compared to will be relatively small compared to
SSTSST..
A perfect fit is indicated by anA perfect fit is indicated by an

SSESSE = 0= 0•• A perfect fit is indicated by an A perfect fit is
indicated by an SSE SSE = 0.= 0.
•• The magnitude of The magnitude of SSESSE depends on
depends on nn and on the units of and on the units of
measurement.measurement.measurement.measurement.
12-24
C
ha
(OLS) pter 1
Squares (OLS)
FormulasFormulas
Coefficient of DeterminationCoefficient of Determination
RR22 i fi f l ti fitl ti fit b d i fb d i f SSRSSR dd
12
•• RR22 is a measure of is a measure of relative fitrelative fit
based on a comparison of based on a comparison of SSR SSR
and and
SSTSST..
•• Often expressed as a percent, an Often expressed as a
percent, an RR22 = 1 (i.e., 100%) indicates = 1 (i.e., 100%)
indicates

perfect fit.perfect fit.
•• In simple regression, In simple regression, RR2 2 = (= (rr))22
12-25
C
hapter 1
12.5 Test For Significance12.5 Test For
SignificanceLO12LO12--55
12
LO12LO12--5: 5: Calculate and interpret confidence intervals
for regressionCalculate and interpret confidence intervals for
regression
coefficients.coefficients.
•• TheThe standard errorstandard error ((ss) is an overall
measure of model fit) is an overall measure of model fit
Standard Error of RegressionStandard Error of Regression
•• The The standard errorstandard error ((ss) is an overall
measure of model fit.) is an overall measure of model fit.
•• If the fitted model’s predictions are perfect If the fitted
model’s predictions are perfect e ed ode s p ed c o s a e pe ece
ed ode s p ed c o s a e pe ec
((SSESSE = 0), then = 0), then ss = 0. Thus, a small = 0. Thus,
a small ss indicates a better fit.indicates a better fit.
•• Used to construct confidence intervals. Used to construct
confidence intervals.
•• Magnitude of Magnitude of ss depends on the units of

measurement of depends on the units of measurement of YY and
on and on
data magnitude.data magnitude.
12-26
C
hapter 1
•• Standard error of the slope and intercept:Standard error of the
slope and intercept:
Confidence Intervals for Slope and InterceptConfidence
Intervals for Slope and Intercept
12
•• Standard error of the slope and intercept:Standard error of the
slope and intercept:
12-27
C
hapter 1
Confidence Intervals for Slope and InterceptConfidence

Intervals for Slope and Intercept
12
•• Confidence interval for the true slope and
interceptConfidence interval for the true slope and intercept::
•• Note: One can use Excel, Minitab, MegaStat or
other software to compute these intervalsp
and do hypothesis tests relating to linear regression.
12-28
C
hapter 1
12
LO12LO12--6: 6: Test hypotheses about the slope and intercept
by using Test hypotheses about the slope and intercept by using
t tests.t tests.
•• If If
influence YY and the regression model and the regression
model
Hypothesis TestsHypothesis Tests
random error.plus random error.
•• The hypotheses to be tested are:The hypotheses to be tested
are:The hypotheses to be tested are:The hypotheses to be tested

are:
df = n -2
or if or if pp--
12-29
C
ha
12 6 A l i f V i O ll Fit12 6 A l i f V i O ll Fit pter 1
12.6 Analysis of Variance: Overall Fit12.6 Analysis of
Variance: Overall FitLO12LO12--88
12
LO12LO12--8: 8: Interpret the standard error, Interpret the
standard error, RR22, ANOVA table, , ANOVA table, and and F
test.F test.
• To test a regression for overall significance, we use an F test
to
F F Test for Overall FitTest for Overall Fit
g g ,
compare the explained (SSR) and unexplained (SSE) sums of
squares.
12-30

12 7 Confidence12 7 Confidence and Predictionand Prediction
C
ha
12.7 Confidence 12.7 Confidence and Prediction and
Prediction
Intervals for Intervals for YY
pter 1
LO12LO12--99
H t C t t I t l E ti t f YH t C t t I t l E ti t f Y
12
LO12LO12--9:9: Distinguish between confidence and prediction
intervals for Y.Distinguish between confidence and prediction
intervals for Y.
C fid I t l f th diti l fditi l f YY
How to Construct an Interval Estimate for YHow to Construct
an Interval Estimate for Y
• Confidence Interval for the conditional mean of conditional
mean of Y.Y.
• Prediction intervals are wider than confidence intervals
because
individual Y values vary more than the mean off YYindividual
Y values vary more than the mean of f YY..
12-31

12 8 R id l T t12 8 R id l T t
C
ha
12.8 Residual Tests12.8 Residual Tests
pter 1
LO12LO12--1010
12
LO12LO12--10: 10: Test residuals for violations of regression
assumptions.Test residuals for violations of regression
assumptions.
Three Important AssumptionsThree Important Assumptions
11 The errors are normally distributedThe errors are normally
distributed1.1. The errors are normally distributed.The errors
are normally distributed.
2.2. The errors have constant variance (i.e., they are The errors
have constant variance (i.e., they are
homoscedastichomoscedastic).).
33 The errors are independent (i e they areThe errors are
independent (i e they are
nonautocorrelatednonautocorrelated))3.3. The errors are
independent (i.e., they are The errors are independent (i.e., they
are nonautocorrelatednonautocorrelated).).
NonNon--normal Errorsnormal Errors
•• NonNon--normalitynormality of errors is a mild violation
since the regression of errors is a mild violation since the
regression
parameter estimates parameter estimates bb00 and and bb11 and

their variances remain and their variances remain
bi d d i t tbi d d i t tunbiased and consistent.unbiased and
consistent.
•• Confidence intervals for the parameters may be untrustworthy
Confidence intervals for the parameters may be untrustworthy
because normality assumption is used to justify usingbecause
normality assumption is used to justify using
12-32
because normality assumption is used to justify using because
normality assumption is used to justify using
Student’s Student’s tt distribution.distribution.
C
ha
pter 1
12.8 Residual Tests12.8 Residual TestsLO12LO12--1010
NonNon--normal Errorsnormal Errors
A l l i ld tA l l i ld t
12
•• A large sample size would compensate.A large sample size
would compensate.
•• Outliers could pose serious problemsOutliers could pose
serious problems..

Normal Probability PlotNormal Probability Plot
•• The The Normal Probability PlotNormal Probability Plot tests
the assumptiontests the assumption
HH00: Errors are normally distributed: Errors are normally
distributed
HH : Errors are not normally distributed: Errors are not
normally distributedHH11: Errors are not normally distributed:
Errors are not normally distributed
•• If If HH00 is true, the is true, the
residual probability residual probability p yp y
plot should be linear plot
should be linear
as shown in the as shown in the example.example.
12-33
C
ha
pter 1
What to Do About NonWhat to Do About Non--
Normality?Normality?
12
1.1. Trim outliers only if they clearly are mistakes.Trim outliers
only if they clearly are mistakes.

2.2. Increase the sample size if possible.Increase the sample
size if possible.
3.3. Try a logarithmic transformation of both Try a logarithmic
transformation of both XX and and YY..
Heteroscedastic Errors (NonHeteroscedastic Errors (Non--
constant Variance)constant Variance)(( ))
•• The ideal condition is if the error magnitude is constant (i.e.,
The ideal condition is if the error magnitude is constant (i.e.,
errors are errors are homoscedastichomoscedastic).).
12-34
))
C
ha
pter 1
Heteroscedastic Errors (NonHeteroscedastic Errors (Non--
constant Variance)constant Variance)
12
•• HeteroscedasticHeteroscedastic errors increase or decrease
with errors increase or decrease with XX..
•• In the most common form ofIn the most common form of
heteroscedasticityheteroscedasticity the variances of thethe
variances of theIn the most common form of In the most

common form of heteroscedasticityheteroscedasticity, the
variances of the , the variances of the
estimators are likely to be understated.estimators are likely to
be understated.
•• This results in overstated This results in overstated tt
statistics and artificially narrow statistics and artificially
narrow yy
confidence intervals.confidence intervals.
Tests for HeteroscedasticityTests for HeteroscedasticityTests
for HeteroscedasticityTests for Heteroscedasticity
•• Plot the residuals against Plot the residuals against XX. . gg
Ideally, there is no pattern in the Ideally, there is no pattern in
the
residuals moving from left to right.residuals moving from left to
right.
12-35
C
ha
pter 1
Tests for HeteroscedasticityTests for Heteroscedasticity
Th “fTh “f t” tt f i i id l i i th tt” tt f i i id l i i th t
12

•• The “fanThe “fan--out” pattern of increasing residual
variance is the most out” pattern of increasing residual variance
is the most
common pattern indicating heteroscedasticity.common pattern
indicating heteroscedasticity.
12-36
C
ha
pter 1
What to Do About Heteroscedasticity?What to Do About
Heteroscedasticity?
12
•• Transform both Transform both XX and and YY, for example,
by taking logs., for example, by taking logs.
•• Although it can widen the confidence intervals for the
coefficients, Although it can widen the confidence intervals for
the coefficients,
heteroscedasticity does not bias the estimates.heteroscedasticity
does not bias the estimates.
Autocorrelated ErrorsAutocorrelated ErrorsAutocorrelated
ErrorsAutocorrelated Errors

• Autocorrelation is a pattern of non-independent errors.
• In a first-order autocorrelation, et is correlated with et-1.
• The estimated variances of the OLS estimators are biased,
resulting in confidence intervals that are too narrow, overstating
the
model’s fit.
12-37
C
ha
pter 1
Runs Test for AutocorrelationRuns Test for Autocorrelation
I thI th t tt t t th b f th id l’ i l (i ht th b f th id l’ i l (i h
12
•• In the In the runs testruns test, count the number of the
residual’s sign reversals (i.e., how , count the number of the
residual’s sign reversals (i.e., how
often does the residual cross the zero centerline?).often does the
residual cross the zero centerline?).
•• If the pattern is random, the number of sign changes should
be If the pattern is random, the number of sign changes should
be n/2n/2. . p , g gp , g g
•• Fewer than Fewer than n/2n/2 would suggest positive
autocorrelation.would suggest positive autocorrelation.

•• More than More than n/2n/2 would suggest negative
autocorrelation.would suggest negative autocorrelation.
DurbinDurbin--Watson (DW) TestWatson (DW) Test
• Tests for autocorrelation under the hypotheses
H0: Errors are non-autocorrelated
H : Errors are autocorrelatedH1: Errors are autocorrelated
• The DW statistic will range from 0 to 4.
DW < 2 suggests positive autocorrelation
12-38
DW = 2 suggests no autocorrelation (ideal)
DW > 2 suggests negative autocorrelation
C
ha
pter 1
What to Do About Autocorrelation?What to Do About
Autocorrelation?
T f b th i bl i thT f b th i bl i th th d f fi t diffth d f fi t diff ii
12
•• Transform both variables using the Transform both variables
using the method of first differencesmethod of first differences
in in

which both variables are redefined as which both variables are
redefined as changeschanges. . Then we regress Y
against X.against X.
•• Although it can widen the confidence interval for the
coefficients, Although it can widen the confidence interval for
the coefficients,
autocorrelation does not bias the estimates.autocorrelation does
not bias the estimates.au oco e a o does o b as e es a esau oco e
a o does o b as e es a es
12-39
12 9 U l12 9 U l Ob tiOb ti
C
ha
12.9 Unusual 12.9 Unusual ObservationsObservations
pter 1
LO12LO12--1111
12
LO12LO12--11: 11: Identify unusual residuals and high
leverage observations.Identify unusual residuals and high
leverage observations.
Standardized ResidualsStandardized Residuals
• One can use Excel Minitab MegaStat or other software to
compute• One can use Excel, Minitab, MegaStat or other
software to compute
standardized residuals.

• If the absolute value of any standardized residual is at least 2,
then it is y ,
classified as unusual.
Leverage and InfluenceLeverage and Influencegg
•• A high A high leverageleverage statistic indicates the
observation is far from the statistic indicates the observation is
far from the
mean of mean of XX. .
•• These observations are influential because they are at the “
end These observations are influential because they are at the “
end
of the lever.”of the lever.”
12-40
•• The leverage for observation The leverage for observation ii
is denoted is denoted hhii ..
C
ha
12 9 U l Ob ti12 9 U l Ob ti
pter 1
12.9 Unusual Observations12.9 Unusual
ObservationsLO12LO12--1111
Leverage Leverage
12

• A leverage that exceeds 3/n is unusual.g
12-41
C
ha
12.10 Other 12.10 Other Regression ProblemsRegression
Problems
pter 1
O tliO tli
12
OutliersOutliers
To fix the problem, To fix the problem,
-- delete the observation(s)delete the observation(s)
d l t th d td l t th d t
Outliers may be caused byOutliers may be caused by
-- an error in recordingan error in recording
-- delete the datadelete the data
-- formulate a multiple regression formulate a multiple
regression
model that includes the lurking model that includes the lurking
datadata
-- impossible data impossible data
-- an observation that hasan observation that has ode a c udes e

u gode a c udes e u g
variable.variable.
-- an observation that hasan observation that has
been influenced by an been influenced by an
unspecified “lurking”unspecified “lurking”
variable that shouldvariable that should
have been controlledhave been controlled
but wasn’tbut wasn’t
12-42
12B-42
but wasn t.but wasn t.
C
hapter 1
12.10 Other Regression Problems12.10 Other Regression
Problems
Model MisspecificationModel Misspecification
If l t di t h b itt d th th d l i
12
• If a relevant predictor has been omitted, then the model is
misspecified.
• Use multiple regression instead of bivariate regression• Use
multiple regression instead of bivariate regression.
IllIll Conditioned DataConditioned DataIllIll--Conditioned
DataConditioned Data

• Well-conditioned data values are of the same general order of
magnitude.
• Ill conditioned data have unusually large or small data values
and• Ill-conditioned data have unusually large or small data
values and
can cause loss of regression accuracy or awkward estimates.
12-43
C
hapter 1
Problems
IllIll--Conditioned DataConditioned Data
A id i i it d b dj ti th it d f d t
12
• Avoid mixing magnitudes by adjusting the magnitude of your
data
before running the regression..
Spurious CorrelationSpurious Correlation
• In a spurious correlation two variables appear related because
of
the way they are defined.
This problem is called the si e effect or problem of totals• This
problem is called the size effect or problem of totals.

12-44
C
hapter 1
Problems
Model Form and Variable TransformsModel Form and Variable
Transforms
S ti li d l i b tt fit th li d lS ti li d l i b tt fit th li d l
12
•• Sometimes a nonlinear model is a better fit than a linear
model. Sometimes a nonlinear model is a better fit than a linear
model.
•• Excel offers many model forms.Excel offers many model
forms.
Variables may be transformed (e g logarithmic or
exponentialVariables may be transformed (e g logarithmic or
exponential•• Variables may be transformed (e.g., logarithmic
or exponential Variables may be transformed (e.g., logarithmic
or exponential
functions) in order to provide a better fit.functions) in order to
provide a better fit.
•• Log transformations reduce heteroscedasticityLog
transformations reduce heteroscedasticityLog transformations
reduce heteroscedasticity.Log transformations reduce
heteroscedasticity.

•• Nonlinear models may be difficult to interpretNonlinear
models may be difficult to interpret..
12-45
C
hapter 112
12-46
C
hapter 112
12-47
SS
C
ha
TwoTwo--Sample Hypothesis TestsSample Hypothesis Tests
pter
10
10.1 Two10.1 Two--Sample TestsSample Tests
10.2 Comparing Two Means: Independent Samples10.2

Comparing Two Means: Independent Samples
10.3 Confidence Interval for the Difference of Two Means,
10.3 Confidence Interval for the Difference of Two Means,
--
10.4 Comparing Two Means: Paired Samples10.4 Comparing
Two Means: Paired Samples
10.5 Comparing Two Proportions10.5 Comparing Two
Proportions
10.6 Confidence Interval for the Difference of Two
Proportions, 10.6 Confidence Interval for the Difference of
Two Prop --
10.7 Comparing Two Variances10.7 Comparing Two Variances
10-1
C
ha
SS
pter 1
Objectives (LO’s)
10
(LO s)
LO10LO10 1:1: R i d f t t f t ith kR i d f t t f t ith kLO10LO10--
1: 1: Recognize and perform a test for two means with known
Recognize and perform a test for two means with known

LO10LO10 22LO10LO10--2: 2: Recognize and perform a test
for two means with unknown Recognize and perform a test for
two means with unknown
LO10LO10--3:3: Recognize paired data and be able to perform a
paired Recognize paired data and be able to perform a paired t
test.t test.
LO10LO10--4: 4: Explain the assumptions underlying the
twoExplain the assumptions underlying the two--sample test of
sample test of
means. means.
LO10LO10--5:5: Perform a test to compare two proportions
using Perform a test to compare two proportions using z.z.
10-2
C
ha
SS
pter
Objectives (LO’s)
10

(LO s)
LO10LO10--6: 6: Check whether normality may be assumed for
two Check whether normality may be assumed for two
proportions.proportions.
LO10LO10--7: 7: Use Excel to find Use Excel to find pp--
values for twovalues for two--sample tests using sample tests
using z z oror t.t.
LO10LO10--8: 8: Carry out a test of two variances using the
Carry out a test of two variances using the F F
distribution.distribution.y gy g
LO10LO10--99: Construct a confidence interval for Construct a
confidence interval for μμ11− − μμ22 or or ππ11− − ππ22
((optional).optional).((optional).optional).
10-3
C
ha
SS
pter
•• A TwoA Two--sample test compares two sample estimates
with eachsample test compares two sample estimates with each
What is a TwoWhat is a Two--Sample TestSample Test
10

•• A TwoA Two--sample test compares two sample estimates
with each sample test compares two sample estimates with each
other.other.
•• A oneA one--sample test compares a sample estimate to a
nonsample test compares a sample estimate to a non--sample
sample p p pp p p pp
benchmark.benchmark.
Basis of TwoBasis of Two--Sample TestsSample TestsBasis of
TwoBasis of Two Sample TestsSample Tests
• Two-sample tests are especially useful because they possess a
built-in point of comparison.
•• The logic of twoThe logic of two--sample tests is based on
the fact that two sample tests is based on the fact that two
l d f th l ti i ld diff tl d f th l ti i ld diff tsamples drawn from the
same population may yield different samples drawn from the
same population may yield different
estimates of a parameter due to chance.estimates of a parameter
due to chance.
10-4
C
ha
SS
pter

•• If the two sample statistics differ by more than the amountIf
the two sample statistics differ by more than the amount
What is a TwoWhat is a Two--Sample TestSample Test
10
•• If the two sample statistics differ by more than the amount If
the two sample statistics differ by more than the amount
attributable to chance, then we conclude that the samples came
attributable to chance, then we conclude that the samples came
from populations with different parameter values.from
populations with different parameter values.
10-5
C
ha
SS
pter
Test ProcedureTest Procedure
10
•• State the hypothesesState the hypotheses
•• Set up the decision ruleSet up the decision rule
•• Insert the sample statisticsInsert the sample statistics
•• Make a decision based on the critical values or using Make a
decision based on the critical values or using pp--valuesvalues
10-6

C
hapter
10.2 Comparing Two Means: Independent 10.2 Comparing
Two Means: Independent
SamplesSamples
LO10LO10--11
10
pp
LO10LO10--1: 1: Recognize and perform a test for two means
with known Recognize and perform a test for two means
with known
σσ11 andand σσ22
Format of HypothesesFormat of Hypotheses
σσ11 and and σσ22..
• The hypotheses for comparing two independent population
ypyp
yp p g p p p
means µ1 and µ2 are:
10-7
C

hapter
SamplesSamples
LO10LO10--11
Case 1: Known VariancesCase 1: Known Variances
10
pp
•• When the variances are known, use the normal distribution
for the When the variances are known, use the normal
distribution for the
Case 1: Known VariancesCase 1: Known Variances
,,
test (assuming a normal population).test (assuming a normal
population).
•• The test statistic is:The test statistic is:
10-8
C
hapter
SamplesSamples

LO10LO10--22
10
LO10LO10--2: 2: Recognize and perform a test for two means
with unknown Recognize and perform a test for two
means with unknown
σσ11 andand σσ22
pp
Case 2: Unknown Variances, Assumed EqualCase 2: Unknown
Variances, Assumed Equal
σσ11 and and σσ22..
•• Since the variances are unknown, they must be estimated
Since the variances are unknown, they must be estimated
and the Student’s and the Student’s tt distribution used to test
the means.distribution used to test the means.
•• Assuming the population variances are equal, Assuming the
population variances are equal, ss1122 and and ss2222
can be used to estimate a common pooled variance can be used
to estimate a common pooled variance sspp22..
10-9
C
hapter
SamplesSamples

LO10LO10--22
Case 3: Unknown Variances, Assumed UnequalCase 3:
Unknown Variances, Assumed Unequal
10
pp
10-10
C
hapter
SamplesSamples
LO10LO10--22
Case 3: Unknown Variances, Assumed UnequalCase 3:
Unknown Variances, Assumed Unequal
10
pp
•• WelchWelch--Satterthwaite testSatterthwaite test
•• A Quick Rule for degrees of freedom is to use min(A Quick
Rule for degrees of freedom is to use min(nn11 –– 1, 1, nn22 ––
1). 1).
10-11

C
hapter
SamplesSamples
If th l ti i 2 d 2 k th th
Summary for the Test StatisticSummary for the Test Statistic
10
pp
the
normal distribution.
• If population variances are unknown and estimated using s12
andIf population variances are unknown and estimated using s1
and
s22, then use the Students t distribution.
10-12
C
hapter

SamplesSamples
Steps in Testing Two MeansSteps in Testing Two Means
10
pp
• Step 1: State the hypotheses
• Step 2: Specify the decision rulep p y
value(s).
• Step 3: Calculate the Test Statistic
•• Step 4Step 4: : Make the decision Reject Make the decision
Reject HH00 if the test statistic falls in the if the test statistic
falls in the pp jj 00
rejection region(s) as defined by the critical value(s).rejection
region(s) as defined by the critical value(s).
• Step 5: : Take action based on the decision. p
10-13
C
hapter
SamplesSamples
If th l i l thIf th l i l th C 2C 2 dd C 3C 3 t tt t

Which Assumption Is Best?Which Assumption Is Best?
10
pp
•• If the sample sizes are equal, the If the sample sizes are
equal, the Case 2Case 2 and and Case 3Case 3 test test
statistics will be identical, although the degrees of freedom may
statistics will be identical, although the degrees of freedom may
differ.differ.
•• If the variances are similar, the two tests will usually agree.If
the variances are similar, the two tests will usually agree.
•• If no information about the population variances is available,
then If no information about the population variances is
available, then p p ,p p ,
the best choice is the best choice is Case 3Case 3..
•• The fewer assumptions, the better.The fewer assumptions, the
better.
Must Sample Sizes Be Equal?Must Sample Sizes Be Equal?
•• Unequal sample sizes are common and the formulas still
applyUnequal sample sizes are common and the formulas still
apply•• Unequal sample sizes are common and the formulas still
apply.Unequal sample sizes are common and the formulas still
apply.
10-14
C
hapter

SamplesSamples
Large SamplesLarge Samples
10
pp
•• For unknown variances, if both samples are large (For
30 and
following 30) and the population is not badly skewed, use the
following
formula with appendix C.formula with appendix C.pppp
Caution: Three IssuesCaution: Three IssuesCaution: Three
IssuesCaution: Three Issues
1.1. Are the populations skewed? Are there outliers? Are the
populations skewed? Are there outliers?
Check using histograms and/or dot plots of each sample.
Check using histograms and/or dot plots of each sample.
tt tests are OK if moderately skewed, especially if samples are
tests are OK if moderately skewed, especially if samples are
10-15
y , p y py , p y p
large. Outliers are more serious.large. Outliers are more
serious.

C
hapter
SamplesSamples
Caution: Three IssuesCaution: Three Issues
22 Are the sample sizes largeAre the sample sizes large (n(n
10
pp
2.2. Are the sample sizes large Are the sample sizes large (n(n
If samples are small, the mean is not a reliable indicator of
central If samples are small, the mean is not a reliable indicator
of central
tendency and the test may lack powertendency and the test may
lack powertendency and the test may lack power.tendency and
the test may lack power.
3.3. Is the difference Is the difference important important as
well as significant?as well as significant?
A ll diff i ti ld b i ifi t ifA ll diff i ti ld b i ifi t ifA small
difference in means or proportions could be significant if A
small difference in means or proportions could be significant if
the sample size is large.the sample size is large.
10-16

C
ha
10 3 C fid I t l f th Diff f10 3 C fid I t l f th Diff f
pter
10.3 Confidence Interval for the Difference of 10.3
Confidence Interval for the Difference of
--
LO10LO10--99
10
LO10LO10--9: 9: Construct a confidence interval for Construct
--
((optional)optional)
Confidence Intervals for the Difference of Two Means
10-17
C
ha
pter
--

LO10LO10--99
10
--
10-18
C
ha
pter
--
LO10LO10--99
10
--
10-19
C
ha
10 4 Comparing Two Means:10 4 Comparing Two Means: pter

LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing
Two Means:
Paired SamplesPaired Samples
10
LO10LO10--3: 3: Recognize paired data and be able to perform
a paired Recognize paired data and be able to perform a paired t
test.t test.
Paired DataPaired Data
•• Data occurs in matched pairs when the same item is observed
Data occurs in matched pairs when the same item is observed
twice but under different circumstances.twice but under
different circumstances.
•• For example blood pressure is taken before and after a
treatmentFor example blood pressure is taken before and after a
treatmentFor example, blood pressure is taken before and after a
treatment For example, blood pressure is taken before and after
a treatment
is given.is given.
•• Paired data are typically displayed in columns.Paired data are
typically displayed in columns.
10-20
C
ha

LO10LO10--33 10.4 Comparing Two Means: 10.4 Comparing
Two Means:
Paired t TestPaired t Test
•• Paired data typically come from a before/after
experimentPaired data typically come from a before/after
experiment
10
•• Paired data typically come from a before/after
experiment.Paired data typically come from a before/after
experiment.
•• In the paired In the paired tt test, the difference between test,
the difference between xx11 and and xx22 is measured is
measured
asas dd == xx11 –– xx22as as dd xx11 xx22
•• The mean and standard deviation for the differences d are
given The mean and standard deviation for the differences d are
given
below.below.
Th t t t ti ti i j t fTh t t t ti ti i j t f l tl t t tt t•• The test statistic
is just for a oneThe test statistic is just for a one--sample
tsample t--test.test.
10-21
C
ha

10.4 Comparing Two Means: 10.4 Comparing Two Means:
LO10LO10--33
St 1 St t th h th f l
Steps in Testing Paired DataSteps in Testing Paired Data
10
• Step 1: State the hypotheses, for example
H0: µd = 0
H1: µd ≠ 01 µd
• Step 2: Specify the decision rule.
determine the critical
values from Appendix D or with use of technology.
St 3 C l l t th t t t ti tiSt 3 C l l t th t t t ti ti tt•• Step 3:
Calculate the test statistic Step 3: Calculate the test statistic tt
•• Step 4: Make the decisionStep 4: Make the decision
Reject Reject HH00 if the test statistic falls in the rejection
region(s) as if the test statistic falls in the rejection region(s) as
jj 00 j g ( )j g ( )
defined by the critical valuesdefined by the critical values
10-22
C
hapter

10.4 Comparing Two Means: 10.4 Comparing Two Means:
LO10LO10--33
A two tailed test for a zero difference is equivalent to asking
Analogy to Confidence IntervalAnalogy to Confidence Interval
10
pp
• A two-tailed test for a zero difference is equivalent to asking
whether the confidence interval for the true mean difference µd
includes zero.
10-23
C
hapter
ProportionsLO10LO10--55
10
LO10LO10--5: 5: Perform a test to compare two proportions
using Perform a test to compare two proportions using z.z.
--

following
hypotheses
10-24
C
hapter
10
--
Sample ProportionsSample Proportions
10-25
C
hapter
10
--

• If H0 is true, there is no difference between
Pooled ProportionPooled Proportion
0 ,
estimate the common population proportion.
10-26
C
hapter
T t St ti tiT t St ti ti
10
Testing for Zero
--
If th l l b d ll
Test StatisticTest Statistic
• If the samples are large, p1 – p2 may be assumed normally
distributed.
• The test statistic is the difference of the sample
proportionsThe test statistic is the difference of the sample
proportions
divided by the standard error of the difference.

• The standard error is calculated by using the pooled
proportion.y g p p p
-
10-27
C
hapter
10
--
Steps in Testing Two ProportionsSteps in Testing Two
Proportions
• Step 1: State the hypotheses
• Step 2: Specify the decision rulep p y
value(s).
use a
pooled estimate of the common proportion.
•• Step 4: Make the decision RejectStep 4: Make the decision
Reject HH if the test statistic falls in theif the test statistic falls
in the•• Step 4: Make the decision Reject Step 4: Make the
decision Reject HH00 if the test statistic falls in the if the test

statistic falls in the
rejection region(s) as defined by the critical value(s).rejection
region(s) as defined by the critical value(s).
10-28
C
hapter
10
LO10LO10--6: 6: Check whether normality may be assumed for
two proportions.Check whether normality may be assumed for
two proportions.
Testing for Zero Difference: Testing for Zero Difference:
--
• We have assumed a normal distribution for the statistic p1 –
p2.
Checking for NormalityChecking for Normality
p1 p2
• This assumption can be checked.
place
If ith l ti i t l th i diff t• If either sample proportion is not
normal, their difference cannot

safely be assumed normal.
• The sample size rule of thumb is equivalent to requiring that
each e sa p e s e u e o u b s equ a e o equ g a eac
sample contains at least 10 “successes” and at least 10
“failures.”
10-29
C
hapter
Proportions
10
Testing for NonTesting for Non--Zero DifferenceZero
Difference
10-30
C
ha
10 6 C fid I t l f th Diff10 6 C fid I t l f th Diff
pter
10.6 Confidence Interval for the Difference 10.6 Confidence
Interval for the Difference
--
•• If the confidence interval does not include 0, then we will

reject If the confidence interval does not include 0, then we will
reject
the null hypothesis of no difference in the proportions.the null
hypothesis of no difference in the proportions.e u ypo es s o o d
e e ce e p opo o se u ypo es s o o d e e ce e p opo o s
10-31
C
hapter
10.7 Comparing Two Variances10.7 Comparing Two
VariancesLO10LO10--88
F t f H thF t f H th
10LO10LO10--8: 8: Carry out a test of two variances using the
Carry out a test of two variances using the F F
distributiondistribution
•• To test whether two population means are equal, we may also
To test whether two population means are equal, we may also
need to test whether two population variances are equalneed to
test whether two population variances are equal
Format of HypothesesFormat of Hypotheses
need to test whether two population variances are equal.need to
test whether two population variances are equal.
10-32

C
hapter
•• The test statistic is the ratio of the sample variances:The test
statistic is the ratio of the sample variances:
The F TestThe F Test
10
•• The test statistic is the ratio of the sample variances:The test
statistic is the ratio of the sample variances:
• If the variances are equal, this ratio should be near unity: F =
1
10-33
C
hapter
The F TestThe F Test
10
• If the test statistic is far below 1 or above 1, we would reject
the
hypothesis of equal population variances.

• The numerator s12 has degrees of freedom df1 = n1 – 1 and
the
denominator s22 has degrees of freedom df2 = n2 – 1.
• The F distribution is skewed with the mean > 1 and its mode <
1• The F distribution is skewed with the mean > 1 and its
mode < 1.
10-34
C
hapter
The F Test: Critical ValuesThe F Test: Critical Values
10
• Critical values for the F test are denoted
FL (left tail) and FR (right tail). L ( ) R ( g )
• A right-tail critical value FR may be found from Appendix F
using
df1 and df2 degrees of freedom.
FR = Fdf1, df2
• A left-tail critical value FR may be found by reversing the
d d i d f f d fi di hnumerator and denominator degrees of
freedom, finding the
critical value from Appendix F and taking its reciprocal:

F = 1/FFL = 1/Fdf2, df1
10-35
C
hapter
The F Test: Critical ValuesThe F Test: Critical Values
10
10-36
C
hapter
Steps in Testing Two VariancesSteps in Testing Two Variances
10
• Step 2: Specify the decision rule
D f f dDegrees of freedom are:

Numerator: df1 = n1 – 1
Denominator: df2 = n2 – 1 2 2
Choose a and find the left-tail and right-tail critical values from
Appendix F.
10-37
C
hapter
Steps in Testing Two VariancesSteps in Testing Two Variances
10
•• Step 3: Calculate the test statistic Step 3: Calculate the test
statistic FFcalccalc = = ss1122//ss2222..
•• Step 4: Make the decisionStep 4: Make the decision
Reject Reject HH00 if the test statistic falls in the rejection
regions as if the test statistic falls in the rejection regions as
defined by the critical values defined by the critical values
FFLL and and FFUU..
10-38
C
hapter

Comparison of Variances: One Tailed TestComparison of
Variances: One Tailed Test
10
• Step 2: State the decision rulep
Degrees of freedom are:
Numerator: df1 = n1 – 1
D i t df 1Denominator: df2 = n2 – 1
Choose a and find the left-tail critical value from Appendix F.
10-39
C
hapter
Comparison of Variances: One Tailed TestComparison of
Variances: One Tailed Test
10
•• Step 3: Calculate the Test Statistic Step 3: Calculate the
Test Statistic FFcalccalc = = ss1122//ss2222..
•• Step 4: Make the decisionStep 4: Make the decisionStep 4:
Make the decisionStep 4: Make the decision

Reject Reject HH00 if the test statistic falls in the leftif the test
statistic falls in the left--tail rejection region as tail rejection
region as
defined by the critical value.defined by the critical value.
10-40
C
hapter
EXCEL’s F TestEXCEL’s F Test
10
10-41
C
hapter
Assumptions of the F TestAssumptions of the F Test
•• TheThe FF test assumes that the populations being sampled
aretest assumes that the populations being sampled are
10

•• The The FF test assumes that the populations being sampled
are test assumes that the populations being sampled are
normal.normal.
•• It is sensitive to nonIt is sensitive to non--normality of the
sampled populations.normality of the sampled populations.y p p
py p p p
•• MINITAB reports both the MINITAB reports both the FF test
and an alternative test and an alternative Levene’s testLevene’s
test
and and pp--values.values.
10-42
SS
C
ha
Sampling Distributions and EstimationSampling Distributions
and Estimation
pter
8
8.1 Sampling Variation
8 2 E ti t d S li E8.2 Estimators and Sampling Errors
8.3 Sample Mean and the Central Limit Theorem
8 4 Confidence Interval for a Mean (μ) with Known σ8.4

Confidence Interval for a Mean (μ) with Known σ
8.5 Confidence Interval for a Mean (μ) with Unknown σ
8 6 Confidence Interval for a Proportion (π)8.6 Confidence
Interval for a Proportion (π)
8.7 Estimating from Finite Populations
8 8 Sample Size Determination for a Mean8.8 Sample Size
Determination for a Mean
8.9 Sample Size Determination for a Proportion
8.10 Confidence Interval for a Populati
(Optional)
8-1
(Optional)
C
ha
SS
pter
and Estimation
Objectives (LO’s)
8
(LO s)
LO8LO8 11LO8LO8--1: 1: Define sampling error, parameter,

and estimator.Define sampling error, parameter, and estimator.
LO8LO8--2: 2: Explain the desirable properties of
estimators.Explain the desirable properties of estimators.
LO8LO8--3:3: State the Central Limit Theorem for a mean.State
the Central Limit Theorem for a mean.
LO8LO8--4:4: Explain how sample size affects the standard
error.Explain how sample size affects the standard
error.LO8LO8 4:4: Explain how sample size affects the standard
error.Explain how sample size affects the standard error.
LO8LO8--5:5: Construct a 90, 95, or 99 percent confidence
interval for Construct a 90, 95, or 99 percent confidence
interval for μ.μ.
8-2
C
ha
SS
pter
and Estimation
Objectives (LO’s)
8
(LO s)
LO8LO8 66LO8LO8--6:6: Know when to use Student’s Know
when to use Student’s t t instead of instead of z z to estimate to

estimate μ.μ.
LO8LO8--7:7: Construct a 90, 95, or 99 percent confidence
interval for π.π.
LO8LO8--8:8: Construct confidence intervals for finite
populations.Construct confidence intervals for finite
populations.
LO8LO8--9:9: Calculate sample size to estimate a mean or
proportion.Calculate sample size to estimate a mean or
proportion.LO8LO8 9:9: Calculate sample size to estimate a
mean or proportion.Calculate sample size to estimate a mean or
proportion.
LO8LO8--10: 10: Construct a confidence interval for a variance
(optional).Construct a confidence interval for a variance
(optional).
8-3
C
ha
8 1 S li V i ti8 1 S li V i ti
pter
8.1 Sampling Variation8.1 Sampling Variation
• Sample statistic – a random variable whose value depends on
8
which population items are included in the random sample.
• Depending on the sample size, the sample statistic could either
represent the pop lation ell or differ greatl from the pop

lationrepresent the population well or differ greatly from the
population.
• This sampling variation can easily be illustrated.
8-4
C
ha
pter
8
C id i ht d l f iC id i ht d l f i 5 f l5 f l•• Consider eight random
samples of size Consider eight random samples of size nn = 5
from a large = 5 from a large
population of GMAT scores for MBA applicants.population of
GMAT scores for MBA applicants.
•• The sample means tend to be close to the population mean
The sample means tend to be close to the population mean
8-5
C
ha
pter

•• The dot plots show that the sample The dot plots show that
the sample meansmeans have much less variation have much
less variation
than thethan the individualindividual sample items.sample
items.
8
than the than the individualindividual sample items. sample
items.
8-6
C
ha
8 2 E ti t d S li Di t ib ti8 2 E ti t d S li Di t ib ti
pter
8.2 Estimators and Sampling Distributions8.2 Estimators and
Sampling DistributionsLO8LO8--11
8
LO8LO8--1: 1: Define sampling error, parameter and
estimator.Define sampling error, parameter and estimator.
E ti tE ti t t ti ti d i d f l t i f th l ft ti ti d i d f l t i f th l f
Some TerminologySome Terminology
•• EstimatorEstimator –– a statistic derived from a sample to
infer the value of a a statistic derived from a sample to infer the

value of a
population parameter.population parameter.
•• EstimateEstimate –– the value of the estimator in a particular
samplethe value of the estimator in a particular
sampleEstimateEstimate –– the value of the estimator in a
particular sample.the value of the estimator in a particular
sample.
•• Population parameters are usually represented by Population
parameters are usually represented by
Greek letters and the corresponding statistic Greek letters and
the corresponding statistic p gp g
by Roman letters.by Roman letters.
8-7
C
ha
pter
Examples of EstimatorsExamples of Estimators
8
Sampling DistributionsSampling Distributions
• The sampling distribution of an estimator is the probability
distribution of

all possible values the statistic may assume when a random
sample of
size n is taken.
8-8
• Note: An estimator is a random variable since samples vary.
C
ha
pter
8
• Sampling errorSampling error is the difference between an
estimate and the
corresponding population parameter. For example, if we use the
sample
ti t f th l ti th th
BiasBias
mean as an estimate for the population mean, then the
• Bias is the difference between the expected value of the
estimator and
the true parameter Example for the mean
BiasBias

the true parameter. Example for the mean,
•• An estimator is An estimator is unbiasedunbiased if its
expected value is the parameter being if its expected value is
the parameter being
estimated. The sample mean is an unbiased estimator of the
population estimated. The sample mean is an unbiased
estimator of the population
iimean sincemean since
•• On averageOn average an unbiased estimator neither
overstates nor understatesan unbiased estimator neither
overstates nor understatesOn averageOn average, an unbiased
estimator neither overstates nor understates , an unbiased
estimator neither overstates nor understates
the true parameter.the true parameter.
8-9
C
ha
pter
8
8-10

C
ha
pter
8
EfficiencyEfficiency
Note: Also, a desirable property for an estimator is for it to be
unbiased.
•• EfficiencyEfficiency refers to the variance of the estimator’s
sampling refers to the variance of the estimator’s sampling
distribution.distribution.
Fi 8 6•• A A more efficientmore efficient estimator has smaller
variance.estimator has smaller variance.Figure 8.6
8-11
C
ha
pter

8
ConsistencyConsistency
A consistent estimator converges toward the parameter being
estimatedA consistent estimator converges toward the parameter
being estimated
as the sample size increases.
Fi 8 6Figure 8.6
8-12
C
ha
8 3 S l M d th C t l Li it Th8 3 S l M d th C t l Li it Th
pter
8.3 Sample Mean and the Central Limit Theorem8.3 Sample
Mean and the Central Limit TheoremLO8LO8--33
8
LO8LO8--3: 3: State the Central Limit Theorem for a
mean.State the Central Limit Theorem for a mean.
The Central Limit Theorem is a powerful result that allows us to
i t th h f th li di t ib ti f th lapproximate the shape of the
sampling distribution of the sample
mean even when we don’t know what the population looks like.

8-13
C
ha
pter
•• If the population is exactly If the population is exactly
normal, then the sample meannormal, then the sample mean
8
•• As the sample size As the sample size nn increases, the
increases, the
distribution of sample means narrowsdistribution of sample
means narrowsnormal, then the sample mean normal, then the
sample mean
follows a normal distribution.follows a normal distribution.
distribution of sample means narrows distribution of
sample means narrows
in on the population mean in on the population mean µµ..
8-14
C
ha

pter
•• If the sample is large enough, the sample means will have If
the sample is large enough, the sample means will have
approximately a normal distribution even if your population is
approximately a normal distribution even if your population is
notnot
8
normal.normal.
8-15
C
ha
pter
Illustrations of Central Limit Theorem Illustrations of Central
Limit Theorem
8
Using the uniform

and a right skewed
di t ib ti
Note:
distribution.
8-16
C
ha
pter
Th C t l Li it Th it t d fi i t l ithi hi h
Applying The Central Limit TheoremApplying The Central
Limit Theorem
8
The Central Limit Theorem permits us to define an interval
within which
the sample means are expected to fall. As long as the sample
size n is
large enough, we can use the normal distribution regardless of
the
population shape (or any n if the population is normal to begin
with).
8-17

C
ha
pter
8
LO8LO8--4: 4: Explain how sample size affects the standard
error.Explain how sample size affects the standard error.
Even if the population standard deviation σ is large, the sample
means
Sample Size and Standard ErrorSample Size and Standard Error
p p g , p
will fall within a narrow interval as long as n is large. The key
is the
standard error of the mean:.. The standard error decreases as n
increasesincreases.
For example, when n = 4 the standard error is halved. To halve
it again
requires n = 16, and to halve it again requires n = 64. To halve
the
standard error, you must quadruple the sample size (the law of
diminishing returns).
8-18

C
ha
pter
Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform
PopulationIllustration: All Possible Samples from a Uniform
Population
8
•• Consider a discrete uniform population consisting of the
integers Consider a discrete uniform population consisting of
the integers
{0 1 2 3}{0 1 2 3}{0, 1, 2, 3}.{0, 1, 2, 3}.
•• The population parameters are:The population parameters are:
1.118.= 1.118.
8-19
C
ha
pter

Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform
PopulationIllustration: All Possible Samples from a Uniform
Population
8
• The population is uniform, yet the distribution of all possible
sample means of size 2 has a peaked triangular shapesample
means of size 2 has a peaked triangular shape.
8-20
C
ha
8 4 Confidence Interval for a Mean (8 4 Confidence Interval for
) with) with pter
8.4 Confidence Interval for a Mean (8.4 Confidence Interval
--55
8
LO8LO8--5: 5: Construct a 90, 95, or 99 percent confidence
interval for μ.μ.
What is a Confidence Interval?What is a Confidence Interval?
8-21

C
ha
--55
What is a Confidence Interval?What is a Confidence Interval?
with knownwith know
8
8-22
C
ha
--55
A hi h fid l l l d t id fid i t lA hi h fid l l l d t id fid i t l
Choosing a Confidence LevelChoosing a Confidence Level

8
•• A higher confidence level leads to a wider confidence
intervalA higher confidence level leads to a wider confidence
interval..
•• Greater confidence Greater confidence
implies loss of precision implies loss of precision
(i t i f(i t i f(i.e. greater margin of (i.e. greater margin of
error).error).
•• 95% confidence is95% confidence is•• 95% confidence is
95% confidence is
most often used.most often used.
Confidence Intervals for Example 8.2
8-23
C
ha
--55
•• A confidence interval either A confidence interval either
doesdoes or
InterpretationInterpretation
8

•• The confidence level quantifies the The confidence level
quantifies the riskrisk..
•• Out of 100 confidence intervals, approximately 95% Out of
100 confidence intervals, approximately 95% maymay contain
while approximately 5% while approximately 5% might
constructing 95%
confidence intervals.confidence intervals.
When Can We Assume Normality?When Can We Assume
Normality?
use the p p , y
formula to compute the confidence interval.
normal, a common
rule of th
formula as long as the
distribution
Is approximately symmetric with no outliers.
• Larger n may be needed to assume normality if you are
sampling from a strongly• Larger n may be needed to assume
normality if you are sampling from a strongly
skewed population or one with outliers.
8-24

C
ha
pter
--66
8
LO8LO8--6: 6: Know when to use Student’s Know when to use
Student’s t t instead of instead of zz to estimate to estimate
•• Use the Use the Student’s t distributionStudent’s t
distribution instead of the normal distribution instead of the
normal distribution
Student’s t DistributionStudent’s t Distribution
when the population is normal but the standard deviation when
unknown and the sample size is small.unknown and the sample
size is small.
8-25
C
ha
with) with pter

--66
8
LO8LO8--6: 6: Know when to use Student’s Know when to use
Student’s t t instead of instead of zz to estimate to estimate
8-26
C
ha
--66
•• tt distributions are symmetric and shaped like the standard
normaldistributions are symmetric and shaped like the standard
normal
8
•• tt distributions are symmetric and shaped like the standard
normal distributions are symmetric and shaped like the standard
normal

•• The The tt distribution is dependent on the size of the
sample.distribution is dependent on the size of the sample.p pp
p
Comparison of Normal and St dent’sComparison of Normal and
St dent’s tt
8-27Figure 8.11
Comparison of Normal and Student’s Comparison of Normal
and Student’s tt
C
ha
--66
Degrees of FreedomDegrees of Freedom
•• Degrees of FreedomDegrees of Freedom ((d fd f ) is a
parameter based on the sample) is a parameter based on the
sample
8
•• Degrees of Freedom Degrees of Freedom ((d.fd.f.) is a
parameter based on the sample .) is a parameter based on the
sample
size that is used to determine the value of the size that is used
to determine the value of the tt statistic.statistic.

•• Degrees of freedom tell how many observations are used to
Degrees of freedom tell how many observations are used to g yg
y
calc
estimates used in , less the number of intermediate estimates
used in
the calculation. The d.f for the the calculation. The d.f for the
tt distribution in this case, is given distribution in this case, is
given
bb d fd f 11by by d.f.d.f. = = nn --1.1.
•• As As nn increases, the increases, the tt distribution
approaches the shape of the distribution approaches the shape of
the
l di t ib til di t ib tinormal distribution. normal distribution.
•• For a given confidence level, For a given confidence level, tt
is always larger than is always larger than zz, so a , so a
confidence interval based onconfidence interval based on tt is
always wider than ifis always wider than if zz were usedwere
usedconfidence interval based on confidence interval based on
tt is always wider than if is always wider than if zz were
used.were used.
8-28
C
ha

--66
Comparison of z and tComparison of z and t
• For very small samples t-values differ substantially from the
8
• For very small samples, t-values differ substantially from the
normal.
• As degrees of freedom increase, the t-values approach the g ,
pp
normal z-values.
• For example, for n = 31, the degrees of freedom, d.f. = 31 – 1
=
30.
So for a 90 percent confidence interval, we would use
t = 1.697, which is only slightly larger than z = 1.645.
8-29
C
ha
E l GMAT S A iE l GMAT S A i
pter

--66
Example GMAT Scores AgainExample GMAT Scores Again 8
8-30
Figure 8.13
C
ha
--66
Example GMAT Scores AgainExample GMAT Scores Again
C t t 90% fid i t l f th GMAT fC t t 90% fid i t l f th GMAT f
8
•• Construct a 90% confidence interval for the mean GMAT
score of Construct a 90% confidence interval for the mean
GMAT score of
all MBA applicants.all MBA applicants.
x = 510 s = 73.77
•• Since Since
use the Student’s tt for the confidence interval for the
confidence interval
with with d.f.d.f. = 20 = 20 –– 1 = 19.1 = 19.

from Appendix D.from Appendix D.
8-31
C
ha
Unknown (Unknown --66
•• For a 90% confidence For a 90% confidence
interval, use Appendixinterval, use Appendix
8
interval, use Appendix interval, use Appendix
D to find tD to find t0.050.05 = 1.729 = 1.729
with with d.f.d.f. = 19.= 19.
Note: One can use Excel,
Minitab, etc. to
obtain these values
as well as to
construct confidence
Intervals.
We are 90 percent confident
that the true mean GMAT
score might be within the
8-32

g
interval [481.48, 538.52]
C
ha
--66
Confidence Interval WidthConfidence Interval Width
• Confidence interval width reflects
8
• Confidence interval width reflects
- the sample size,
- the confidence level and
- the standard deviation.
• To obtain a narrower interval and more precision
i th l i- increase the sample size or
- lower the confidence level (e.g., from 90% to 80%
confidence).
8-33
C

ha
) with
--66
Using Appendix DUsing Appendix D
8
•• Beyond Beyond d.f. d.f. = 50, Appendix D shows = 50,
Appendix D shows d.f. d.f. in steps of 5 or 10.in steps of 5 or
10.
•• If the table does not give the exact degrees of freedom, use
the If the table does not give the exact degrees of freedom, use
the g g ,g g ,
tt--value for the next lower degrees of freedom.value for the
next lower degrees of freedom.
•• This is a conservative procedure since it causes the interval
to be This is a conservative procedure since it causes the
interval to be
slightly wider.slightly wider.
• A conservative statistician may use the t distribution for
confidence intervals when σ is unknown becauseconfidence
intervals when σ is unknown because
using z would underestimate the margin of error.
8-34

C
hapter
8.6 Confidence Interval for a Proportion (8.6 Confidence
--77
8
LO8LO8--7: 7: Construct a 90, 95, or 99 percent confidence
interval for π.π.
•• A proportion is a mean of data whose only values are 0 or
1.A proportion is a mean of data whose only values are 0 or 1.
8-35
C
hapter
--77
Applying the CLTApplying the CLT
8
•• The distribution of a sample proportion The distribution of a
sample proportion pp = = xx//n n is symmetric if is symmetric if
= .50
as , approaches symmetry as nn increases.increases.
8-36

C
hapter
--77
When is it Safe to Assume Normality of p?When is it Safe to
Assume Normality of p?
8
•• Rule of Thumb: Rule of Thumb: The sample proportion The
sample proportion pp = = xx//nn may be assumed to may be
assumed to
d
nn(1(1--
Sample size to assume
normality:y
Table 8.9 8-37
C
hapter
--77
8

••
unknown, the confidence interval for pp = = xx//nn
(assuming a large sample) is(assuming a large sample) is
8-38
C
hapter
Interval --77
Example AuditingExample Auditing
8
8-39
C
hapter
8.7 Estimating from Finite Population8.7 Estimating from
Finite PopulationLO8LO8--88
8
LO8LO8--8: 8: Construct Confidence Intervals for Finite
PopulationsConstruct Confidence Intervals for Finite
Populations.
N = population size; n = sample size
8-40

C
hapter
8.8 Sample Size determination for a Mean8.8 Sample Size
determination for a MeanLO8LO8--99
8
LO8LO8--9: 9: Calculate sample size to estimate a mean or
proportionCalculate sample size to estimate a mean or
proportion.
•• To estimate a population mean with a precision of To
estimate a population mean with a precision of ++ E E
(allowable (allowable
error), you would need a sample of size. Now, error), you
would need a sample of size. Now,
8-41
C
hapter
8.8 Sample Size determination for a Mean8.8 Sample Size
determination for a MeanLO8LO8--99
8

•• Method 1: Method 1: Take a Preliminary SampleTake a
Preliminary Sample
Take a small preliminary sample and use the sample Take a
small preliminary sample and use the sample ss in place of in
place of
the sample size formula.in the sample size formula.
•• Method 2: Method 2: Assume Uniform PopulationAssume
Uniform Population
Estimate rough upper and lower limitsEstimate rough upper and
lower limits aa andand bb and setand setEstimate rough upper
and lower limits Estimate rough upper and lower limits aa and
and bb and set and set
--aa)/12])/12]½½. .
•• Method 3: Method 3: Assume Normal PopulationAssume
Normal Populatione od 3e od 3 ssu e o a opu a ossu e o a opu a
o
Estimate rough upper and lower limits Estimate rough upper and
--aa)/4.
)/4.
This assumes normality with most of the data with This assumes
normality with most of the data wit
•• Method 4: Method 4: Poisson ArrivalsPoisson Arrivals
8-42

C
hapter
8.9 Sample Size determination for a Proportion8.9 Sample Size
determination for a ProportionLO8LO8--99
•• To estimate a population proportion with a precision of To
estimate a population proportion with a precision of ±± E E
(allowable error), you would need a sample of size (allowable
error), you would need a sample of size
8
error is a number between 0 and 1, the allowable error EE is is
8-43
also between 0 and 1. also between 0 and 1.
C
hapter
8.9 Sample Size determination for a Proportion8.9 Sample Size
determination for a ProportionLO8LO8--99
8
.50
This conservative method ensures the desired precision

HoweverThis conservative method ensures the desired precision
HoweverThis conservative method ensures the desired
precision. However, This conservative method ensures the
desired precision. However,
the sample may end up being larger than necessary.the sample
may end up being larger than necessary.
•• Method 2Method 2: : Take a Preliminary SampleTake a
Preliminary Sample
T k ll li i l d th lT k ll li i l d th l i l fi l fTake a small
preliminary sample and use the sample Take a small preliminary
in the sample size formula.in the sample size formula.
•• Method 3Method 3:: Use a Prior Sample or Historical
DataUse a Prior Sample or Historical DataMethod 3Method 3: :
Use a Prior Sample or Historical DataUse a Prior Sample or
Historical Data
How often are such samples available? Unfortunately, How
might be
different enough to make it a questionable assumption. different
enough to make it a questionable assumption.
8-44
8.10 Confidence Interval for a Population Variance (8.10
Confidence Interval for a Population Variance
--1010
(optional).

If th l ti i l th th l iIf th l ti i l th th l i 22
ChiChi--Square DistributionSquare Distribution
•• If the population is normal, then the sample variance If the
population is normal, then the sample variance ss22
follows the follows the chichi--square distributionsquare
freedom freedom d.f.d.f. = = nn –– 1.1.eedoeedo dd
tail percentiles for the chi) tail percentiles for the chi--
square distribution can be found using Appendix Esquare
distribution can be found using Appendix E..
8-45
--1010
(optional).
U i th l iU i th l i 22 th fid i t l ith fid i t l i
Confidence IntervalConfidence Interval
•• Using the sample variance Using the sample variance ss22,
the confidence interval is, the confidence interval is
•• To obtain a confidence interval for the standard deviation To
obtain a confidence interval for the standard deviation

the square root of the interval bounds.
8-46
--1010
You can use Appendix E to find critical chi-square values.
8-47
--1010
Caution: Assumption of NormalityCaution: Assumption of
Normality
•• The methods described for confidence interval estimation of
the The methods described for confidence interval estimation of
the
variance and standard deviation depend on the population
having a variance and standard deviation depend on the
population having a
normal distributionnormal distributionnormal
distribution.normal distribution.
•• If the population does not have a normal distribution, then
the If the population does not have a normal distribution, then
the

confidence interval should not be considered accurateconfidence
interval should not be considered accurateconfidence interval
should not be considered accurate.confidence interval should
not be considered accurate.
8-48
CC
C
ha
Continuous Probability DistributionsContinuous Probability
Distributions
pter
7
7 1 Describing a Continuous Distribution7 1 Describing a
Continuous Distribution7.1 Describing a Continuous
Distribution7.1 Describing a Continuous Distribution
7.2 Uniform Continuous Distribution 7.2 Uniform Continuous
Distribution
7 3 N l Di t ib ti7 3 N l Di t ib ti7.3 Normal Distribution7.3
Normal Distribution
7.4 Standard Normal Distribution7.4 Standard Normal
Distribution
7.5 Normal Approximations7.5 Normal Approximations
7.6 Exponential Distribution7.6 Exponential Distributionpp
7.7 Triangular Distribution (Optional)7.7 Triangular
Distribution (Optional)

7-1
C
ha
CC
pter
Distributions
Objectives (LO’s)
7
(LO s)
LO7LO7 11LO7LO7--11: : Define a continuous random
variable.Define a continuous random variable.
LO7LO7--2: 2: Calculate uniform probabilities.Calculate
uniform probabilities.
LO7LO7--3: 3: Know the form and parameters of the normal
distribution.Know the form and parameters of the normal
distribution.
LO7LO7--4:4: Find the normal probability for given z or x
using tables or Excel.Find the normal probability for given z or
x using tables or Excel.LO7LO7 4:4: Find the normal
probability for given z or x using tables or Excel.Find the
normal probability for given z or x using tables or Excel.
LO7LO7--5:5: Solve for z or x for a given normal probability
using tables or Excel.Solve for z or x for a given normal

probability using tables or Excel.
7-2
C
ha
CC
pter
Distributions
Objectives (LO’s)
7
(LO s)
LO6LO6LO6:LO6: Use the normal approximation to a binomial
or a PoissonUse the normal approximation to a binomial or a
Poisson
LO7:LO7: Find the exponential probability for a given xFind
the exponential probability for a given x..
LO8: LO8: Solve for x for given exponential probability.Solve
for x for given exponential probability.
LO9:LO9: Use the triangular distribution for “whatUse the
triangular distribution for “what--if” analysis (optional).if”
analysis (optional).

7-3
C
ha
7 1 D ibi C ti Di t ib ti7 1 D ibi C ti Di t ib ti
pter
7.1 Describing a Continuous Distribution7.1 Describing a
Continuous DistributionLO7LO7--11
7
LO7LO7--1: 1: Define a continuous random variable.Define a
continuous random variable.
Di t V i blDi t V i bl h l fh l f XX h it b bilith it b bilit
PP((XX))
Events as IntervalsEvents as Intervals
•• Discrete VariableDiscrete Variable –– each value of each
value of XX has its own probability has its own probability
PP((XX).).
•• Continuous VariableContinuous Variable –– events are events
are intervalsintervals and probabilities are and probabilities are
areas under continuous curves A single point has no
probabilityareas under continuous curves A single point has no
probabilityareas under continuous curves. A single point has no
probability.areas under continuous curves. A single point has
no probability.
7-4

C
ha
pter
Continuous PDF’s:
PDF PDF –– Probability Density FunctionProbability Density
Function
7
Continuous PDF’s:
• Denoted f(x)
• Must be nonnegative• Must be nonnegative
• Total area under
curve = 1
• Mean, variance and
shape depend on
the PDF parametersthe PDF parameters
• Reveals the shape
of the distribution
7-5

C
ha
pter
CDF CDF –– Cumulative Distribution FunctionCumulative
Distribution Function
7
Continuous CDF’s:
• Denoted F(x)
• Shows P(X ≤ x), the
cumulative proportion
fof scores
• Useful for finding
probabilitiesprobabilities
7-6
C
ha
pter

Probabilities as AreasProbabilities as Areas
7
Continuous probability functions:Continuous probability
functions:
•• Unlike discrete Unlike discrete
distributions, the distributions, the
probability at any probability at any
single point = 0single point = 0single point = 0.single point = 0.
•• The entire area under The entire area under
any PDF, by definition, any PDF, by definition, y , y ,y , y ,
is set to 1.is set to 1.
•• Mean is the balanceMean is the balance
point of the distribution.point of the distribution.
7-7
C
ha
pter
Expected Value and VarianceExpected Value and Variance
7

The mean and variance of a continuous random variable are
analogous to
E(X) and Var(X ) for a discrete random variable, Here the
integral sign
replaces the summation sign. Calculus is required to compute
the integrals. p g q p g
7-8
C
ha
7 2 U if C ti Di t ib ti7 2 U if C ti Di t ib ti
pter
7.2 Uniform Continuous Distribution7.2 Uniform Continuous
DistributionLO7LO7--22
7
LO7LO7--2: 2: Calculate uniform probabilities.Calculate
uniform probabilities.
Characteristics of the Uniform Characteristics of the Uniform
DistributionDistribution
If If XX is a random variable that is is a random variable that is
uniformly distributed between uniformly distributed between
aa and and bb, its PDF has , its PDF has
constant height.constant height.
• Denoted U(a, b)
• Area =

base x height =base x height
(b-a) x 1/(b-a) = 1
7-9
C
ha
pter
Characteristics of the Uniform DistributionCharacteristics of
the Uniform Distribution
7
7-10
C
ha
pter
Example: Anesthesia EffectivenessExample: Anesthesia
Effectiveness

•• An oral surgeon injects a painkiller prior to extracting a tooth
Given theAn oral surgeon injects a painkiller prior to extracting
a tooth Given the
7
An oral surgeon injects a painkiller prior to extracting a tooth.
Given the An oral surgeon injects a painkiller prior to
extracting a tooth. Given the
varying characteristics of patients, the dentist views the time for
varying characteristics of patients, the dentist views the time for
anesthesia effectiveness as a uniform random variable that takes
anesthesia effectiveness as a uniform random variable that takes
between 15 minutes and 30 minutesbetween 15 minutes and 30
minutesbetween 15 minutes and 30 minutes.between 15 minutes
and 30 minutes.
•• XX is is UU(15, 30)(15, 30)
•• aa = 15,= 15, bb = 30, find the mean and standard deviation.=
30, find the mean and standard deviation.aa 15, 15, bb 30,
find the mean and standard deviation. 30, find the mean and
standard deviation.
•• Find the probability that the effectiveness anesthetic takes
between Find the probability that the effectiveness anesthetic
takes between
20 and 25 minutes.20 and 25 minutes.
7-11
20 and 25 minutes.20 and 25 minutes.
C
ha

pter
7
Example: Anesthesia EffectivenessExample: Anesthesia
Effectiveness
PP(20 < (20 < XX < 25) = (25 < 25) = (25 –– 20)/(30 20)/(30 ––
15) = 5/15 = 0.3333 = 33.33% 15) = 5/15 = 0.3333 = 33.33%
7-12
C
ha
7 3 N l Di t ib ti7 3 N l Di t ib ti
pter
7.3 Normal Distribution7.3 Normal DistributionLO7LO7--33
7
LO7LO7--3: 3: Know the form and parameters of the normal
distribution.Know the form and parameters of the normal
distribution.
Characteristics of the Normal DistributionCharacteristics of the
Normal Distribution
• Normal or Gaussian (or bell shaped) distribution was named
for German

mathematician Karl Gauss (1777 – 1855).
• Domain is –
• Almost all (99.7%) of the area under the normal curve is
included in the ( )
range µ –
• Symmetric and unimodal about the mean.
7-13
C
ha
pter
Normal Distribution
7
7-14
C
ha

pter
7
Normal Distribution
•• Normal PDF Normal PDF ff((xx) reaches a maximum at )
reaches a maximum at µµ and has points of inflection at and has
points of inflection at
Bell-shaped curve
NOTE:NOTE: All normal All normal
distributionsdistributionsdistributions distributions
have the same have the same
shape but differshape but differshape but differ shape but differ
in the axis scales.in the axis scales.
7-15
C
ha
pter
7
Normal Distribution

•• Normal CDF Normal CDF
7-16
C
ha
7 4 St d d N l Di t ib ti7 4 St d d N l Di t ib ti
pter
Characteristics of the Standard Normal
DistributionCharacteristics of the Standard Normal Distribution
7
rent normal
distribution, we transform a normal y µ , ,
random variable to a standard normal distribution with µ = 0
7-17
C
ha
pter

LO7LO7--33 7.4 Standard Normal Distribution7.4 Standard
Normal Distribution
Characteristics of the Standard NormalCharacteristics of the
Standard Normal
St d d l PDF f( ) h i t 0 d h
7
• Standard normal PDF f(x) reaches a maximum at z = 0 and has
points of inflection at +1.
•• Shape is unaffected by Shape is unaffected by
the transformationthe transformationthe transformation. the
transformation.
It is still a bellIt is still a bell--shaped shaped
curve.curve.
Figure 7 11
7-18
Figure 7.11
C
ha
pter
Normal Distribution
Characteristics of the Standard NormalCharacteristics of the

Standard Normal
•• Standard normal CDFStandard normal CDF
7
A common scale•• Standard normal CDFStandard normal CDF •
A common scale
from -3 to +3 is used.
• Entire area under the
curve is unity.
• The probability of an
event P(z < Z < z )event P(z1 < Z < z2)
is a definite integral
of f(z).
• However, standard
normal tables or
Excel functions can
be used to find the
desired probabilities.
7-19
C
ha
pter
Normal Distribution

Normal Areas from Appendix CNormal Areas from Appendix C-
-11
• Appendix C-1 allows you to find the area under the curve
7
• Appendix C-1 allows you to find the area under the curve
from 0 to z.
• For example, find P(0 < Z < 1.96):p , ( )
7-20
C
ha
pter
Normal Distribution
-11
•• Now findNow find PP((--1 96 <1 96 < ZZ < 1 96)< 1 96)
7
•• Now find Now find PP((--1.96 < 1.96 < ZZ < 1.96).< 1.96).
•• Due to symmetry, Due to symmetry, PP((--1.96 < 1.96 < ZZ)
is the same as ) is the same as PP((ZZ < 1.96).< 1.96).
• So, P(-1.96 < Z < 1.96) = .4750 + .4750 = .9500 or 95% of the
area under the curve.

7-21
C
ha
pter
Normal Distribution
Basis for the Empirical RuleBasis for the Empirical Rule
7
• Approximately 68% of the area under the curve is between +
• Approximately 95% of the area under the curve is between +
• Approximately 99.7% of the area under the curve is between +
7-22
C
ha
pter

N l A f A di CN l A f A di C 22
7
LO7LO7--4: 4: Find the normal probability for given z or x
using tables or Excel.Find the normal probability for given z or
x using tables or Excel.
-22
•• Appendix CAppendix C--2 allows you to find the area under
the curve from the left of 2 allows you to find the area under the
curve from the left of
zz (similar to Excel)(similar to Excel)z z (similar to
Excel).(similar to Excel).
•• For example, For example,
PP((ZZ < < --1.96)1.96)PP((ZZ < 1.96< 1.96) PP((--1.96 < 1.96
< ZZ < 1.96)< 1.96)
7-23
C
ha
pter

Normal Areas from Appendices CNormal Areas from
Appendices C--1 or C1 or C--22
•• Appendices CAppendices C--1 and C1 and C--2 yield
identical results2 yield identical results
7
•• Appendices CAppendices C--1 and C1 and C--2 yield
identical results.2 yield identical results.
•• Use whichever table is easiest.Use whichever table is easiest.
Finding Finding zz for a Given Areafor a Given Area
• Appendices C-1 and C-2 can be used to find the
z-value corresponding to a given probability.
For e ample hat al e defines the top 1% of a normal• For
example, what z-value defines the top 1% of a normal
distribution?
• This implies that 49% of the area lies between 0 and z
whichThis implies that 49% of the area lies between 0 and z
which
gives z = 2.33 by looking for an area of 0.4900 in Appendix C-
1.
7-24
C
ha
pter

Finding Areas by using Standardized VariablesFinding Areas by
using Standardized Variables
7
•• Suppose John took an economics exam and scored 86 points
The classSuppose John took an economics exam and scored 86
points The classSuppose John took an economics exam and
scored 86 points. The class Suppose John took an economics
exam and scored 86 points. The class
mean was 75 with a standard deviation of 7. What percentile is
John in? mean was 75 with a standard deviation of 7. What
percentile is John in?
That is, what is That is, what is PP((XX < 86) where X
represents the exam scores?< 86) where X represents the exam
scores?
•• So John’s score is 1.57 standard deviations about the mean.
So John’s score is 1.57 standard deviations about the mean.
•• PP((XX < 86) = < 86) = PP((ZZ < 1.57) = .9418 (from
Appendix C< 1.57) = .9418 (from Appendix C--2)2)
•• So John is approximately in the 94So John is approximately
in the 94thth percentilepercentile•• So, John is approximately in
the 94So, John is approximately in the 94thth
percentilepercentile..
7-25
C

ha
pter
•• Finding Areas by using Standardized VariablesFinding Areas
by using Standardized Variables
7
NOTE: You can use Excel, Minitab, TI83/84 etc. to compute
these
probabilities directly.
7-26
C
ha
pter
7
LO7LO7--5: 5: Solve for z or x for a normal probability using
tables or Excel.Solve for z or x for a normal probability using
tables or Excel.
•• Inverse NormalInverse Normal

• How can we find the various normal percentiles (5th 10th 25th
75th• How can we find the various normal percentiles (5th,
10th, 25th, 75th,
90th, 95th, etc.) known as the inverse normal? That is, how can
we
find X for a given area? We simply turn the standardizing
transformation around:
=
7-27
C
ha
pter
7
• For example, suppose that John’s economics professor has
decided
that any student who scores below the 10th percentile must
retake the
exam.
• The exam scores are normal with μ = 75 and σ = 7.
• What is the score that would require a student to retake the

exam?
• We need to find the value of x that satisfies P(X < x) = 10We
need to find the value of x that satisfies P(X < x) .10.
• The z-score for with the 10th percentile is z = −1.28.
7-28
C
ha
pter
Inverse NormalInverse Normal
7
• The steps to solve the problem are:The steps to solve the
problem are:
• Use Appendix C or Excel to find z = −1.28 to satisfy P(Z <
−1.28) = .10.
• Substitute the given information into z = (x μ)/σ to get•
Substitute the given information into z = (x − μ)/σ to get
−1.28 = (x − 75)/7
• Solve for x to get x = 75 − (1.28)(7) = 66.03 (or 66 after
rounding)
S• Students who score below 66 points on the economics exam

will be
required to retake the exam.
7-29
C
ha
pter
7
7-30
C
ha
7 5 N l A i ti7 5 N l A i ti
pter
7.5 Normal Approximations7.5 Normal
ApproximationsLO7LO7--66
7
LO7LO7--6: 6: Use the normal approximation to a binomial or a
Poisson.Use the normal approximation to a binomial or a

Poisson.
Normal Approximation to the BinomialNormal Approximation
to the Binomial
Bi i l b biliti diffi lt t l l t hBi i l b biliti diffi lt t l l t h i li l••
Binomial probabilities are difficult to calculate when Binomial
probabilities are difficult to calculate when nn is large.is large.
•• Use a normal approximation to the binomial distribution.Use
a normal approximation to the binomial distribution.
•• AsAs nn becomes large, the binomial bars become smaller
and continuity isbecomes large, the binomial bars become
smaller and continuity isAs As nn becomes large, the binomial
bars become smaller and continuity is becomes large, the
binomial bars become smaller and continuity is
approached.approached.
7-31
C
ha
pter
Normal Approximation to the BinomialNormal Approximation
to the Binomial
7

-
appropriate to use
the normal approximation to the binomial distribution.
• In this case the mean and standard deviation for the binomial
distribution• In this case, the mean and standard deviation for
the binomial distribution
Example Coin FlipsExample Coin Flips
If t fli i 32 ti d 50 th• If we were to flip a coin n = 32 times and
requirements for a normal approximation to the binomial
distribution
met?
7-32
C
ha
pter
7
n(1- - .50) = 16

• So a normal approximation can be usedSo, a normal
approximation can be used.
• When translating a discrete scale into a continuous scale,
care must be taken about individual points.
• For example, find the probability of more than 17 heads in
32 flips of a fair coin.
• However, “more than 17” actually falls between 17 and 18
on a discrete scale.
7-33
C
ha
pter
•• Since the cutoff point for “more than 17” is halfway between
17 and 18 weSince the cutoff point for “more than 17” is
halfway between 17 and 18 we
7
Since the cutoff point for more than 17 is halfway between 17
and 18, we Since the cutoff point for more than 17 is halfway

between 17 and 18, we
add 0.5 to the lower limit and find add 0.5 to the lower limit
and find PP((XX > 17.5).> 17.5).
•• This addition to This addition to XX is called the is called the
Continuity CorrectionContinuity Correction..
•• At this point, the problem can be completed as any normal
distribution At this point, the problem can be completed as any
normal distribution
problem.problem.
7-34
C
ha
pter
7
P(X > 17) P(X ≥ 18) P(X ≥ 17 5)P(X >
17.5)
= P(Z > 0.53) = 0.2981
7-35

C
ha
pter
Normal Approximation to the PoissonNormal Approximation to
the Poisson
• The normal approximation to the Poisson distribution works
best
7
• The normal approximation to the Poisson distribution works
best
B).
deviation µ q
for the Poisson distribution.
Example Utility BillsExample Utility Billsp yp y
• On Wednesday between 10A.M. and noon customer billing
inquiries arrive at a mean rate of 42 inquiries per hour at
Consumers Energy. What is the probability of receiving more
than 50 calls in an hour?
7-36

C
ha
pter
Example Utility BillsExample Utility Bills
7
•• To find To find PP((XX > 50) calls, use the continuity> 50)
calls, use the continuity--corrected cutoff point halfway
corrected cutoff point halfway
between 50 and 51 (i.e., between 50 and 51 (i.e., XX = 50.5).=
50.5).
•• At this point the problem can be completed as any normal
distributionAt this point the problem can be completed as any
normal distributionAt this point, the problem can be completed
as any normal distribution At this point, the problem can be
completed as any normal distribution
problem.problem.
7-37
C

hapter
7.6 Exponential Distribution7.6 Exponential
7
LO7LO7--7: 7: Find the exponential probability for a given
xFind the exponential probability for a given x..
Characteristics of the Exponential DistributionCharacteristics of
the Exponential Distribution
If t it f ti f ll P i di t ib ti th ti til thIf t it f ti f ll P i di t ib ti th
ti til th•• If events per unit of time follow a Poisson
distribution, the time until the If events per unit of time follow
a Poisson distribution, the time until the
next event follows the next event follows the Exponential
distribution.Exponential distribution.
•• The time until the next event is a continuous variable.The
time until the next event is a continuous variable.
NOTE HereNOTE HereNOTE: Here NOTE: Here
we will findwe will find
probabilitiesprobabilities
> x or ≤ x.> x or ≤ x.
7-38
C
hapter

Characteristics of the Exponential DistributionCharacteristics of
the Exponential Distribution
7
Probability of waiting more than xProbability of waiting less
than or
equal to x
7-39
equa to
C
hapter
Example Customer Waiting TimeExample Customer Waiting
Time
7
• Between 2P.M. and 4P.M. on Wednesday, patient insurance
inquiries arrive at Blue Choice insurance at a mean rate of 2.2
calls
iper minute.
• What is the probability of waiting more than 30 seconds (i.e.,
0.50
minutes) for the next call?minutes) for the next call?

• P(X > 0 50) = e– –(2.2)(0.5) = 3329P(X > 0.50) = e = e
( )( ) = .3329
or 33.29% chance of waiting more than 30 seconds for the next
call.
7-40
C
hapter
Example Customer Waiting TimeExample Customer Waiting
Time
7
P(X > 0.50) P(X ≤ 0.50)
7-41
C
hapter
7
LO7LO7--8: 8: Solve for Solve for x for given x for given

exponential probability.exponential probability.
Inverse ExponentialInverse Exponential
If th i l t i 2 2 ll i t t th 90If th i l t i 2 2 ll i t t th 90thth•• If the
mean arrival rate is 2.2 calls per minute, we want the 90If the
mean arrival rate is 2.2 calls per minute, we want the 90thth
percentile for waiting time (the top 10% of waiting
time).percentile for waiting time (the top 10% of waiting time).
•• Find theFind the xx--valuevalueFind the Find the xx--value
value
that defines the that defines the
upper 10%.upper 10%.
7-42
C
hapter
Inverse ExponentialInverse Exponential
7
7-43
C
hapter

7
Mean Time Between EventsMean Time Between Events
7-44
C
hapter
7.7 Triangular Distribution7.7 Triangular
Ch t i ti f th T i l Di t ib tiCh t i ti f th T i l Di t ib ti
7
LO7LO7--9: 9: Use the triangular distribution for “whatUse the
triangular distribution for “what--if” analysis (optional).if”
analysis (optional).
Characteristics of the Triangular DistributionCharacteristics of
the Triangular Distribution
7-45
C
hapter

7
The triang lar distrib tion is a a of thinking abo t ariation that•
The triangular distribution is a way of thinking about variation
that
corresponds rather well to what-if analysis in business.
• It is not surprising that business analysts are attracted to the
triangular
model.
• Its finite range and simple form are more understandable than
a normal
distribution.
7-46
C
hapter
7
• It is more versatile than a normal, because it can be skewed in
either

direction.
Y t it h f th i ti f l h di ti t d• Yet it has some of the nice
properties of a normal, such as a distinct mode.
• The triangular model is especially handy for what-if analysis
when the
business case depends on predicting a stochastic variable (e.g.,
the price
of a raw material, an interest rate, a sales volume).
• If the analyst can anticipate the range (a to c) and most likely
value (b), it will
be possible to calculate probabilities of various outcomes. p p
• Many times, such distributions will be skewed, so a normal
wouldn’t
be much help.
7-47
屏幕快照 2014-03-17 下午1.37.25.png
__MACOSX/._屏幕快照 2014-03-17 下午1.37.25.png
屏幕快照 2014-03-17 下午1.37.28.png
屏幕快照 2014-03-17 下午1.38.04.png
屏幕快照 2014-03-17 下午1.38.09.png

屏幕快照 2014-03-17 下午1.38.13.png
屏幕快照 2014-03-17 下午1.38.17.png
屏幕快照 2014-03-17 下午1.38.21.png
屏幕快照 2014-03-17 下午1.38.28.png
屏幕快照 2014-03-17 下午1.38.32.png

SSChaSimple RegressionSimple Regressionpter C.docx

More Related Content

Similar to SSChaSimple RegressionSimple Regressionpter C.docx (20)

More from rafbolet0 (20)

Recently uploaded (20)

SSChaSimple RegressionSimple Regressionpter C.docx