SlideShare a Scribd company logo
STT 200
STATISTICAL METHODS
Chapter 2:
Foundation for Inference – Part 2
2.5 The Central Limit Theorem
■ Here are a few of the null distributions we’ve looked at
throughout this chapter.
■ What patterns can you identify regarding their shape?
The null distribution for testing
independence of precipitation vs.
type of day of the week.
The null distribution for testing
whether a new sales pitch is
better than a current one.
2
Government shutdown
The null distribution for testing whether network
political inclination is independent of reported polling
results about the government shutdown.
3
The Central Limit Theorem (CLT) - 1
■ In Chapter 1, we learned distributions are often left- or right-
skewed.
■ However, these null distributions are symmetric and relatively
bell-shaped.
■ Do you think this is a coincidence?
■ TOP HAT
■ This is NOT a coincidence! The shape of these distributions is,
in fact, mathematically guaranteed by the Central Limit
Theorem.
■ It says that under certain conditions, in the limit, certain
sample statistics are approximately normally distributed!
■ Average behavior is normal.
■ Prof. Shlomo Levental
4
The Central Limit Theorem (CLT) - 2
■ If we look at a proportion (or difference in proportions)
and the scenario meets certain conditions,
■ then the sample proportion (or difference in proportions)
will appear to follow a bell-shaped curve called
a normal distribution.
𝑓 𝑥 =
1
𝜎 2𝜋
𝑒
−
1
2
𝑥−𝜇
𝜎
2
Center is Mean: 𝜇
Spread is SD: 𝜎
The equation of
the curve is
CENTER/SPREAD 5
Conditions for the CLT
In order for the CLT to apply, two conditions must be true:
1. Observations in the sample(s) are independent.
■ Independence is often guaranteed in an observational study by
taking a random sample from a population.
■ It can also be guaranteed in the context of a controlled experiment if
we randomly divide individuals into treatment groups.
2. The sample size is sufficiently large.
■ In order for the null distribution to take on the shape of a normal
curve, we must have gathered a sufficiently large sample of data,
regardless of whether it is an observational study or controlled
experiment.
■ Just how large is large enough?
■ That differs from one context to the next, and we’ll provide
guidelines as we encounter them through the rest of the semester.
6
2.6 The Normal Distribution
Here are three different normal curves. What do they share in common?
Normal curves always have the following five characteristics:
1. Unimodal (single peak)
2. Symmetic
3. Bell Shaped (or Mound Shaped)
4. Center is the mean and Spread is the standard deviation
5. Area under any normal curve is probability and Total area/probability is 1
7
Shape of the Normal Curve
■ Despite these common characteristics, normal
distributions can look quite different, as you can see
above.
■ Specifically, the normal distribution can be adjusted
using two parameters, the mean and the standard
deviation.
■ Change Center: Changing the mean of a normal curve
shifts the curve to the left or right.
■ Change Spread: Changing the standard deviation of a
normal curve
stretches or constricts the curve around the mean.
8
Labelling the Normal Curve
■ If a normal curve has mean 𝜇 and standard deviation 𝜎, statisticians
will write the distribution as the 𝑵 𝝁, 𝝈 distribution.
■ The three distributions above can be written (from left to right) as
the 𝑁(0,1), the 𝑁(1,1.5) and 𝑁(−2, 0.7) distributions.
9
Standard Normal Distribution
■ Because the mean and standard
deviation describe a normal
distribution exactly, they are called
the distribution’s parameters.
■ MATRIX Movie: NEO
■ SPECIAL NORMAL DISTRIBUTION:
When a normal curve has
mean 𝜇 = 0 and standard deviation
𝜎 = 1, we label the curve the
Standard normal or N(0,1) or Z curve.
10
Using Calculator to find Probabilities, Areas
and Percentiles on Page 81
To find a probability if a data value is known:
2nd Vars – “normalcdf” – enter “lower limit, upper limit,
mean, sd”
Example: 𝑃(900 ≤ 𝑋 ≤ 1200) where
𝜇 = 1060 𝑎𝑛𝑑 𝜎 = 195
Enter 2nd Vars – normalcdf (900, 1200, 1060, 195) enter.
Answer 0.557644
To find data values when given an area (or percentage):
2nd Vars – “invnorm” – enter (enter area to the left as decimal,
mean, sd)
Example: Find the score or data value corresponding to the 80th
percentile where 𝜇 = 1060 𝑎𝑛𝑑 𝜎 = 195
2nd Vars – “invnorm” – (0.80, 1060, 195) enter
Answer: 1224
Step-by-step instructions with examples at
https://msu.edu/~fairbour/MSU/CalculatorHelps/Normal
CurveCalcInstructions.pdf
Normal tables in olden days!!!
11
Example: SAT scores
■ Cumulative SAT scores are approximated well by a
normal model, 𝑁(1060, 195).
■ Provide a sketch of the approximating normal curve.
12
Applying Z Scores: SAT scores 1
1. Approximately what proportion of test takers score between
900 and 1200 on the SAT?
Given: Data Values
To find: Proportion/Probability/Area
Sketch and label center data values
Press 2nd Press Vars
Choose Normalcdf
Lower: 900
Upper:1200
Mean: 1060
SD: 195
Answer: 0.5576
13
Finding probabilities – known data values
Step 1: Sketch a picture of the area you’re trying to find.
Step 2: Compute the area using a calculator / computer
software.
■ To find a probability if a data value is known:
– 2ND VARS – “normalcdf” – enter “lower limit, upper
limit, mean, sd”
■ Example: 𝑃(900 ≤ 𝑋 ≤ 1200)
– Press 2nd Vars – normalcdf (900, 1200, 1060, 195)
Enter.
– Answer 0.557644
14
Applying Z Scores: SAT scores 2
2. A randomly-selected SAT test-taker is about to sit for the test.
Nothing is known about her aptitude. What is the probability that
she scores at least 1300 on her SATs?
Given: Data Values
To find: Probability/Area
2nd Vars Normalcdf
Lower: 1300
Upper: 𝟏𝟎 𝟏𝟎
Mean: 1060
SD: 195
Answer: 0.1092
15
Applying Z Scores: SAT scores 3
c. Another SAT test-taker is taking the SAT for a second time
after earning a 1100 on his first attempt. What was the
percentile of his first score?
DO THE FOLLOWING NOW!!!
■ Sketch the normal curve
■ Label Center
■ Mark the data values
■ Shade the required area
■ Find the probability using GC
■ Think: What is the lower limit?
■ TOP HAT
16
Applying Z Scores: SAT scores 3
c. Another SAT test-taker is taking the SAT for a second time
after earning a 1100 on his first attempt. What was the
percentile of his first score?
Answer:
■ Normalcdf
𝑳𝒐𝒘𝒆𝒓: −𝟏𝟎 𝟏𝟎
Upper: 1100
Mean: 1060
SD: 195
Answer: 58th Percentile
17
Applying Z Scores: SAT scores 4
d. What is the SAT score of someone who scores at the 80th
percentile?
Given: Percentile/Probability
To find: Data value
Should we use Normalcdf???
No, USE 2nd Vars invNorm
Must enter left side area!!!
Area: 0.80
Mean: 1060
SD: 195
Answer: 1224
???
18
Finding data values with known area
Step 1: Sketch a picture with the data value you’re trying to
find.
Step 2: Compute the data value using a calculator /
computer software.
■ To find data values when given an area (or percentage):
– 2ND VARS– “invnorm” –(enter area to the left as a
decimal, mean, sd)
■ Example: Find data value corresponding to 80th
percentile.
– 2ND VARS – invNorm(0.80, 1060, 195) enter
– Answer: 1224.116
19
Standardizing with Z scores: Formula!
■ Often, it is valuable to quantify how far an observation falls from its
mean or expected value.
■ Recall that the SD gives us the typical average distance an
observation falls from its mean or expected value
■ Standardized score or z-score:
𝑧 =
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
=
𝑥 − 𝜇
𝜎
■ We can interpret a z-score as quantifying the number of standard
deviations an observation falls from its mean or expected value.
■ In using this formula, some Normal Variable X is converted to
Standard Normal Variable Z
■ There is an interesting connection between the area under the
normal curves of X and Z.
20
There are two major tests of readiness for college, the ACT and
the SAT.
■ ACT scores are reported on a scale from 1 to 36. The
distribution of ACT scores for more than 1 million students in a
recent high school graduating class was mound-shaped and
symmetric with
– mean = 20.8 and sd = 4.8.
■ SAT scores are reported on a scale from 400 to 1600. The SAT
scores for 1.4 million students in the same graduating class
was mound-shaped and symmetric with
– mean = 1060 and sd = 195.
■ Tonya scores 1320 on the SAT. Jessie scores 28 on the ACT.
■ Both seemed to have done well.
■ But, who did better in their respective test, can you tell?
ACT vs. SAT
21
Who did better on college prep test?
■ ACT: average = 20.8, SD = 4.8
■ SAT: average = 1060, SD = 195
– Tonya scores 1320 on the SAT. Jessie scores 28 on
the ACT. Assuming that both tests measure the same
thing, who has the higher score (relatively)?
■ Question: Who did better?
■ Who is further away from the mean?
■ Calculate the z-scores:
■ TOP HAT 
−
=
x
z
22
Interpreting z-scores Extra question
■ Police department salaries in San Francisco have a
mean of $90,702 and an sd of $45,321.
■ A chief of police’s salary has a z-score of z = 4.84.
Interpret this z-score.
– A. The chief of police makes 4.84 times as much as
the average employee salary.
– B. The chief of police’s salary is nearly 5 standard
deviations above the average employee salary.
– C. Only 4.84% of employees make more than the
chief of police.
■ TOP HAT
23
Z-score formula preserves …
■ Use normalcdf to calculate the approximate percentage
of students who scored better than Jessie on the ACT.
■ Use normalcdf to calculate the area above 𝐳 =
𝟏. 𝟓 using the 𝑁(0, 1) distribution. Recall, Jessie’s z-
score is 1.5.
PAGE 82: Add this before finding probabilities with z-scores.
■ What do you notice about the probabilities?
■ TOP HAT
24
Z-score formula preserves …
P(Z>1.5) =0.0668P(ACT>28) =0.0668
Recall, that Jessie ACT score was 28 and her z-score was 1.5.
25
Z-score formula preserves probability
■ Use normalcdf to calculate the approximate percentage of
students who scored better than Jessie on the ACT.
■ Use normalcdf to calculate the area above 𝐳 = 𝟏. 𝟓 using the
𝑁(0, 1) distribution. Recall, Jessie’s z-score is 1.5.
PAGE 82: Add this before finding probabilities with z-scores.
■ What do you notice about the probabilities?
■ The probabilities are the same.
■ Why?
■ Z-score formula preserves area under normal curves
(probability).
P(ACT < a) = P( Z < a*)
where a* is the z-score of a.
26
Finding probabilities for z scores (1)
1. Find P(−1 ≤ 𝑍 ≤ 1)
In words, find the probability that the standard normal
variable takes values within one standard deviation of the
mean.
Hint: Remember Matrix? Neo!
Use normalcdf with 𝑵(𝟎, 𝟏)
Lower: -1
Upper: 1
Mean: 0
SD: 1
Answer: 0.6827
27
Finding probabilities for z scores (2)
2. What is the probability that a standard normal variable Z
is within 2 standard deviations of mean?
That is, find P(-2 ≤ Z ≤ 2).
■ TOP HAT
28
Finding probabilities for z-scores (3)
2. What is the probability that a standard normal variable Z
is within 2 standard deviations of mean?
That is, find P(-2 ≤ Z ≤ 2).
■ TOP HAT
■ Answer: normalcdf (-2, 2, 0, 1) = 0.9545
3. Find P(-3 ≤ Z ≤ 3).
Answer: normalcdf (-3, 3, 0, 1) = 0.9973
29
Normal curves and the empirical rule
For data that follows a normal distribution,
■ Approximately 68% of the data will have a z-score
between -1 and 1.
■ Approximately 95% of the data will have a z-score
between -2 and 2.
■ Approximately 99.7% of the data will have a z-score
between -3 and 3.
■ So, in general:
■ 68%, 95% and 99.7%
lie within one, two and
three SDs of the
mean.
30
Finding z-scores for probabilities
4. What z-scores provide the bounds for the middle 50% of the standard
normal distribution?
■ THINK: What do we want to find? Two bounds so middle area is 0.5.
■ Let us call this bound as a.
■ Then this is –a by symmetry. So, just need to find a.
■ THINK: Which function should we use in GC? TOP HAT
■ THINK: Middle Area is 0.5 then what is the left side area for the
negative bound -a?
■ THINK: Middle Area is 0.5 then what is the left side area for the
positive bound a? TOP HAT
Z~N(0,1)
What is this
z-score?
31
Finding z-scores for probabilities
4. What z-scores provide the bounds for the middle 50% of
the standard normal distribution?
On the Z curve we have middle 50%
Remaining area is 50%, so 25% on each side!
32
Finding z-scores for probabilities
4. What z-scores provide the bounds for the middle 50% of the standard normal
distribution?
Left side area for a is 0.75 and not 75, must be a decimal between 0 and 1.
Answer: invNorm(0.75, 0, 1) or invNorm(0.25, 0, 1)
-0.674 and 0.674 are the bounds for middle 50%
33
Finding z-scores for probabilities
5. What z-scores provide the bounds for the middle 95% of
the standard normal distribution?
■ THINK: Which function?
■ THINK: What is the left side area?
■ TOP HAT
34
Finding z-scores for probabilities
5. What z-scores provide the bounds for the middle 95% of
the standard normal distribution?
■ THINK: Which function?
■ THINK: What is the left side area?
■ TOP HAT
■ Answer: invNorm (0.975, 0, 1) = 1.96
■ So, -1.96 and 1.96 are the bounds for the middle 95% of
the standard normal distribution.
35
Example: IQ scores (1)
■ IQ test scores are formulated to be normally distributed, that
is, they follow the shape of a normal curve.
Suppose we have the following sample of 30 IQ scores:
■ Let us verify whether these scores are from a Normal model:
Mean: 97.1 and SD: 11.7.
1. How many of these scores are within 1 standard deviation of
the mean? HINT: Count them!
■ First find the range of IQ scores that are within 1 SD of the
mean.
■ Mean ± 1 x SD = 97.1 ± 11.7. So, between 85.4 and 108.8
65 80 81 83 85 89 90 91 91 92 94 95 97 97 97
97 99 100 101 101 101 102 104 105 106 107 109 112 120 121
36
Example: IQ scores (1)
■ IQ test scores are formulated to be normally distributed,
that is, they follow the shape of a normal curve.
Suppose we have the following sample of 30 IQ scores:
1. How many of these scores are within 1 standard
deviation of the mean? HINT: Count them!
■ We count 21 IQ scores between 85.4 and 108.8
■ So, 21/30 = 70% of IQ scores lie within 1 SD of the
mean
65 80 81 83 85 89 90 91 91 92 94 95 97 97 97
97 99 100 101 101 101 102 104 105 106 107 109 112 120 121
37
Example: IQ scores (2)
■ IQ test scores are known to be normally distributed, that
is, they follow the shape of a normal curve. Suppose we
have the following sample of 30 IQ scores:
■ How many of these scores are within 2 standard
deviations of the mean?
■ Mean ± 2 x SD = 97.1 ± 2 x 11.7
■ That is, between 73.7 and 120.5
■ We count 28 of the scores, so 28/30 = 93.33%
65 80 81 83 85 89 90 91 91 92 94 95 97 97 97
97 99 100 101 101 101 102 104 105 106 107 109 112 120 121
38
Example: IQ scores (3)
■ IQ test scores are known to be normally distributed, that is, they
follow the shape of a normal curve. Suppose we have the following
sample of 30 IQ scores:
■ How many of these scores are within 3 standard
deviations of the mean?
■ Mean ± 3 x SD = 97.1 ± 3 x 11.7
■ All of them. So, 30/30 = 100%.
■ Do you think the data follows the empirical rule well?
■ Yes, the data seems to follow the empirical rule. So, we
can believe that this data is from a Normal population.
65 80 81 83 85 89 90 91 91 92 94 95 97 97 97
97 99 100 101 101 101 102 104 105 106 107 109 112 120 121
39
Evaluating the normal approximation
■ Many data sets can be well-approximated by the normal
distribution.
■ We saw earlier that SAT scores, ACT scores and IQ scores
are well-approximated by the normal model.
■ While these models are helpful and convenient, we must
remember that they are only an approximation.
■ Often, it is important to evaluate just how good (or bad)
of an approximation the normal model is when applied to
a scenario.
■ There are two simple visual ways to assess whether a
normal approximation is appropriate:
1. Histogram
2. QQ-plot (normal probability plot)
40
Histograms and QQ plots
■ Here are the histogram and QQ plot for the IQ score example:
■ A QQ-plot is short for quantile-quantile plot
■ (quantile = percentile rank)
■ Sample’s quantiles/percentile ranks on the vertical axis and Z’s
quantiles/percentile ranks on the horizontal axis
■ For example, sample Q1 will be matched with Z’s Q1, etc.
■ In a QQ-plot, the closer the dots are to a perfect straight line, the more
confident we can be that our data follow the normal model.
41
Histograms and QQ-plots 1
Consider the following histograms of data sets along
with the QQ-plots:
42
Histograms and QQ-plots 2
43
Histograms and QQ-plots 3
For which of the data sets would you recommend using a
normal curve to model the distribution?
TOP HAT
44
2.7 Applying the normal model (1)
Standard Error
■ Sample statistics or Point estimates vary from sample to
sample, and it is often valuable to quantify that variability
with what is called the standard error (SE).
■ The standard error of a point estimate is
approximately equal to the standard deviation associated
with the estimate.
■ For example, if we look at the normal approximation of
the distribution of sample proportions, then the standard
error will be used as the standard deviation of sample
proportions, 𝑺. 𝑬ෝ𝒑
45
Applying the normal model (2)
■ For instance, if we had a sample statistic/point estimate
with 𝑆𝐸 = 4.2 units, that would mean that this point
estimate, over many repeated samples, would be
approximately 4.2 units away from the parameter it
estimates, on average.
■ The way we compute the 𝑆𝐸 of a point estimate varies for
different types of point estimates.
■ We will cover these computations in more detail in later
chapters. For now, let’s return to some familiar research
scenarios.
46
Example: “Boomeranging” (1)
■ We are on PAGE 86
■ Recall the research study that investigated whether the rate at
which men ‘boomeranged’ back to their parents’ homes as
adults had changed from its 1997 level of 13%.
■ The researchers took a random sample of 150 adult men and
found that 25 of them had left their parental home and then
returned.
The hypotheses that were tested were:
■ 𝐻0: There has been no change in the rate of ‘boomeranging’
among young men. The percentage is still 13% and any
difference in the sample is due to chance.
■ 𝐻 𝑎: There has been a change in the rate of ‘boomeranging’ for
young men.
47
Randomization simulation
48
Example: “Boomeranging” (2)
1. Use the information above to report a point estimate for
the current rate of ‘boomeranging’ among young men, the
approximate p-value from this randomization test, and then
evaluate the evidence with regards to the associated
hypotheses.
■ Sample proportion or Point estimate = 25/150 = 0.1667
■ The approximate p-value from two tails is 0.2265
■ There is very little evidence that the null model is not a
good fit for the observed results.
2. Now try to reproduce this p-value using a normal
distribution approach.
49
Example: “Boomeranging” (3)
2. Create a quick sketch in the space below of a normal
distribution centered at 0.13 with a standard error of
0.0275.
50
Example: “Boomeranging” (4)
Interpret the standard error of 0.0275.
Hint for Exam2:
■ Here our sample proportion of boomerangers
𝐡𝐚𝐬 𝒂 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓, 𝑺𝑬 = 𝟎. 𝟎𝟐𝟕𝟓.
■ INTERPRETATION: This means that point estimates/sample
proportions, on average, over many repeated samples, would be
approximately 0.0275 units away from the parameter/population
proportion of boomerangers of 0.13.
51
Example: “Boomeranging” (5)
2. Calculate the Z score using the observed ‘boomeranging’
rate of
25
150
= 0.1667, along with the mean and standard
error of the normal model. [Notice how we use the standard
error of the statistic as the standard deviation to find the z
score.]
𝑧 =
0.1667 − 0.13
0.0275
= 1.3345

−
=
x
z
52
Example: “Boomeranging” (6)
3. Identify the p-value corresponding
to this z score.
■ Recall, area under Z curve is the same
as corresponding area under the
normal curve for sample proportions
■ normalcdf(1.3345, 𝟏𝟎 𝟏𝟎
, 0, 1)
■ Right tail area = 0.091.
Or could do,
normalcdf(0.1667, 𝟏𝟎 𝟏𝟎, 𝟎. 𝟏𝟑, 𝟎. 𝟎𝟐𝟕𝟓)
This is not recommended, as later on, we
have different formulas for S.E. and it gets
confusing.
1.3345
53
Example: “Boomeranging” (7)
Recall:
Observed boomeranging rate
is 0.1667
𝑧 = 1.3345
■ What is the definition of p-value?
■ Under the null model,
the probability of observing a Boomeranging rate of 0.1667
and beyond, that is, as far away as 0.1667 is from 0.13 or
worse.
■ TOP HAT: To find p-value should we multiply the right tail area
by 2? How can we tell whether to multiply by 2 or not???
54
Example: “Boomeranging” (8)
3. Identify the p-value corresponding to this z score. How
does it compare to the p-value from the randomization
simulation? Would we make the same evaluation regarding
the null model?
■ In terms of z-scores, it is the probability of observing a z-score of
1.3345 and beyond, that is, z-scores less than -1.3345 and z-scores
greater than 1.3345.
■ We have a two tailed test, so by symmetry,
p-value = 2 x 0.091 = 0.182
■ For randomization based test
the approximate p-value was 0.2265.
■ Here, we have a slightly smaller p-value
but the evaluation is still the same!
■ There is LITTLE evidence that
the null model IS NOT a good fit
for the observed results.
1.3345-1.3345
55
Example: Web Design (1)
■ Recall the observational study undertaken by an online
art gallery to see whether investing in a website redesign
would increase the percentage of premium accounts
from its current rate of 25%.
■ The gallery surveyed 500 users and found that 150 of
them say they would continue or purchase a premium
account if the new features were included.
The hypotheses tested were:
■ 𝐻0: The percentage of premium accounts will not change
after including additional features on the website; it will
remain at 25%. Any difference from this rate is due to
chance involved in the sampling process.
■ 𝐻 𝑎: The percentage of premium accounts will change
after including additional features on the website.
56
Example: Web Design (2)
From the randomization test, we could see that the p-value
was small and we were inclined to think that the percentage
of premium accounts might increase.
1. Now try to reproduce this p-value using a normal
distribution approach. Create a quick sketch in the space
below of a normal distribution centered at 0.25 with a
standard error of 0.019.
57
Example: Web Design (3)
2. Calculate a Z score using the observed ‘boomeranging’
rate of
25
150
= 0.3, along with the mean and standard error of
the normal model. [Notice how we use the standard error of
the statistic as the standard deviation for the z score.]
■ TOP HAT: To find p-value should we multiply the right tail
area by 2? How can we tell whether to multiply by 2 or
not???

−
=
x
z
𝑧 =
0.3 − 0.25
0.019
= 2.6316
58
Example: Web Design (4)
3. Identify the p-value corresponding to this z score. How does it
compare to the p-value from the randomization simulation? Would we
make the same evaluation regarding the null hypothesis?
■ Normalcdf (2.6316, 𝟏𝟎 𝟏𝟎, 0, 1)
■ P-value = 0.0042 (one-tail test)
■ From Section 2.4, for the
randomization based
test, we had a p-value less than
0.001, that is, extremely strong
evidence that null model is not a
good fit for our data.
■ Whereas, here we only have very
strong
evidence that null model is not a
good fit.
■ In both cases, we will recommend
redesign of website to the owners.
59
2.8 Confidence Intervals
We are on PAGE 88
■ A sample statistic provides a single plausible value for a
population parameter using collected sample data.
■ That is why, we call this sample statistic a point estimate.
■ Sometimes, it is more useful to provide a plausible range
of values for that parameter.
■ Statisticians call this plausible range of values a
confidence interval.
■ Suppose we have a large bin filled
with small green and white balls
and we want to know the proportion
of white balls in the
bin/box/population.
60
Example: Estimating the proportion
Question: What is the proportion of white balls in the bin?
In this scenario, we could count all the balls and find the
population proportion but that would take too much time.
■ Instead, let’s take a random sample of, say, 100 balls.
■ If our sample of 100 balls has 59 white and 41 green
balls, what is the sample proportion of white balls?
■ Sample proportion of white balls: Ƹ𝑝 𝑤ℎ𝑖𝑡𝑒 =
59
100
■ We call this sample proportion, point estimate
■ Based on this sample point estimate, give a plausible
range of values for the population proportion.
■ How confident are you that your answer is correct? TOP
HAT
61
Sample estimates vary
■ We know based on experience that sample estimates
vary, and that under certain conditions they will follow a
normal model.
■ The graph shows 3000 sample proportions for this
scenario.
■ Our sample proportion of
Ƹ𝑝 = 0.59 is somewhere in
this distribution, and likely
near the center of the
distribution but not at the
center.
62
Standard Error
■ Say, we can also estimate the standard error to be
0.048.
■ Since we’re talking about a sampling model, we call this
standard deviation the standard error.
■ We will learn how to calculate the standard error in
chapter 3.
63
Range of plausible values
■ Sample proportion is Ƹ𝑝 = 0.59
■ Standard error is 0.048
■ Use what you know about the normal distribution to construct
a new range of plausible values for the population proportion
of white balls..
■ Hint: Use empirical rule!
■ How confident are you that your answer is correct? TOP HAT
■ App for confidence intervals
https://shiny.stt.msu.edu/fairbour/Confidence/
■ How likely is it that the interval includes the true population
proportion? TOP HAT
■ How many out of 100 intervals do you expect to contain the
true population proportion? TOP HAT
64
The gist of confidence intervals (1)
1. The value of the sample estimate will vary from one
sample to the next.
■ The values vary around the population parameter.
2. The standard error of the sample estimate provides an
idea of how far away it would tend to vary from the
parameter value (on average).
3. The general format for a confidence interval is given by:
■ Sample (point) estimate ± (a few) standard errors
4. The “few” or number of standard errors we go out each
way from the sample estimate will depend on what coverage
rate (i.e., how confident) we want to be.
65
VIDEO ILLUSTRATION:
66
The gist of confidence intervals (2)
5. The “how confident” we want to be is referred to as the
confidence level.
■ This level reflects how confident we are in the procedure.
■ The confidence level is the percentage of the time we
expect the procedure to produce an interval that
contains the population parameter.
■ Most of the intervals that are made will contain the truth
about the population, but occasionally an interval will be
produced that does not contain the true parameter
value.
■ Each interval either contains the population parameter
or it doesn’t.
67
How many standard errors?
We are at the bottom of PAGE 89
■ This depends on the confidence level.
■ Given the standard normal distribution, what are the
boundaries for the middle 95%?
■ Use invNorm(0.975, 0, 1), see page 83 on Lecture guide.
■ -1.96 and 1.96 are the middle 95% boundaries.
■ If we set “a few” to be z = 1.96, then we can expect that 95%
of the sample proportions will be in the interval
population parameter ± 1.96 standard errors
■ This is a fact.
■ But, recall we are estimating the population parameter,
meaning we do not know the population parameter.
■ We know our point estimate/sample proportion!!!
68
Procedure and confidence
69
Let us look at this the other way round:
point estimate ± 1.96 standard errors
Constructing Confidence Intervals
Calculate the interval
point estimate ± 1.96 standard errors
for each of the following:
Standard error is 0.048
■ Sample 1 has point estimate, Ƹ𝑝 = 0.59
■ Answer: (0.49592, 0.68408)
■ Sample 2 has Ƹ𝑝 = 0.71
■ Answer: (0.61592, 0.80408)
■ Sample 3 has Ƹ𝑝 = 0.75
■ Answer: (0.65592, 0.84408)
70
Which intervals include the parameter?
■ In this example, we actually know the true proportion is
𝑝 = 0.66.
■ Which of the intervals you calculated included this value?
71
(0.49592 0.68408)
(0.61592, 0.80408)
(0.65592, 0.84408)
Population proportion=0.66
Key Idea
■ Because 95% of sample proportions are within 1.96
standard error of the population parameter,
approximately 95% of the intervals we create using this
procedure will include the parameter.
■ More practice with this idea at
https://shiny.stt.msu.edu/fairbour/Confidence/
72
Example 1: Pass or Fail?
■ Earlier, we encountered a research scenario where an
engineering instructor examined whether students with
an urban/suburban background are more likely to pass
the course than rural/small-town students. The table
below shows the results of this study.
Student Background Pass Fail Total
Urban/Suburban 52 13 65
Rural/Small-town 30 25 55
Total 82 38 120
73
Example 1: Pass or Fail? Interval
■ The point estimate suggests that students from an urban/suburban
background are more likely to pass the course:
■ Ƹ𝑝 𝑈𝑟𝑏𝑎𝑛 − Ƹ𝑝 𝑅𝑢𝑟𝑎𝑙 =
52
65
−
30
55
= 0.255 .
■ The standard error of this estimate is 𝑆𝐸 = 0.0852.
■ Construct a 95% confidence interval for the true difference in the
proportions of urban and rural students that pass the course.
point estimate ± 1.96 standard errors = 0.255 ± 1.96 x 0.0852
■ The 95% confidence interval for the true difference in the
proportions of urban and rural students that pass the course is
(0.088008, 0.421992)
■ 0.088008 is called the lower bound of the confidence interval.
■ 0.421992 is called the upper bound of the confidence interval.
■ The plausible values for the true difference are between 8.8% and
42.2%.
74
Example 2: Web Design Interval
■ Recall the observational study undertaken by an online
art gallery to see whether investing in a website redesign
would increase the percentage of premium accounts
from its current rate of 25%. The gallerists surveyed 500
users and found that 150 of them say they would
purchase a premium account if the new features were
included.
■ Earlier, we used the point estimate Ƹ𝑝 = 0.3 and its
standard error 𝑆𝐸 = 0.0205 to conduct a hypothesis test
based on the normal distribution.
■ Use these same values to create a 95% confidence
interval for the proportion of Premium account users.
TOP HAT
75
Example 2: Web Design Interval
Interpretation
■ Answer: The 95% CI for the proportion of Premium users
is (0.25982, 0.34018)
■ Notice that the value 0.25 does not fall within the 95%
confidence interval!
■ We can interpret this to mean the confidence interval
does not consider it to be a reasonable value for the true
percentage of premium accounts – this is consistent with
our evaluation of the evidence for the hypothesis test we
conducted earlier.
76
Interpretation
■ The phrase confidence level is used to describe the likeliness
or chance that a yet-to-be constructed interval will actually
contain the true population value.
■ However, we have to be careful about how to interpret this
level of confidence if we have already computed our interval of
values.
■ The population parameter is not a random quantity, it does not
vary - once we have “looked” (computed) the actual interval,
we cannot talk about probability or chance for this particular
interval anymore.
■ Unlike in the movie “Harry Potter”!
■ https://youtu.be/81scFUYQGbU?t=78
■ The 95% confidence level applies to the procedure, not to an
individual interval; it applies “before you look” and not “after
you look” at your data and compute your observed interval of
values.
77
Changing the confidence level - 1
■ Suppose we want to create a range of plausible values for a
parameter that will have more than 95% confidence [i.e.,
create the interval using a process that will capture the
parameter more than 95% of the time].
■ Let’s return to our original formula for a 95% confidence
interval:
𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸
■ Notice this interval has three components: the point estimate,
the standard error of that estimate, and a multiplier of 1.96.
■ invNorm(0.975, 0, 1) = 1.96
■ Recall that we chose this multiplier of 1.96 earlier after
observing that 95% of observations of a normally-distributed
variable fall within 1.96 standard deviations of the mean.
78
Changing the confidence level - 2
■ invNorm(0.995, 0, 1) = 2.576
■ invNorm(0.95, 0, 1) = 1.645
■ By extension, we could observe that 99% of observations
fall within 2.576 standard deviations of the mean, and
that only 90% of observations fall within 1.645 standard
deviations of the mean.
■ If the point estimate of a parameter follows a normal
model with standard error 𝑆𝐸, then a confidence interval
for that population parameter is:
𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑧∗ ∗ 𝑆𝐸
where 𝑧∗
corresponds to the level you’d like the confidence
interval to have.
79
Confidence level multipliers
■ Use the table below to jot down some confidence levels
that are commonly seen in statistical studies, along with
their associated multipliers.
■ After looking at this table, you’ll probably notice a key
idea underlying confidence intervals:
■ If you want to be more confident in your interval of
plausible values, you need to make your interval wider.
Confidence Level Multiplier 𝑧∗
90% 1.645
95% 1.96
99% 2.576
80
The Logic of Confidence Intervals
■ Consider all possible random samples of the same large size n.
■ Each possible random sample provides a possible sample statistic
value.
■ If we made a histogram of all of these possible statistics it would
look like the normal distribution.
■ About 95% of the possible sample statistics will be in the interval
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ± 1.96 ∗ 𝑆𝐸
■ and for each one of these sample statistic values, the interval
𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸 will contain the population parameter.
■ Thus about 95% of the intervals
𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸
will contain the population parameter.
■ Note: The  part of the interval 1.96 ∗ 𝑆𝐸 is called the 95% margin of
error.
81

More Related Content

PDF
Chapter 3
PPT
Chapter 06
PDF
Probability Distributions
PPTX
Statistical distributions
PPTX
Chapter 07
PPTX
Probability Distribution
PDF
Statistical inference: Probability and Distribution
PPTX
Psych stats Probability and Probability Distribution
Chapter 3
Chapter 06
Probability Distributions
Statistical distributions
Chapter 07
Probability Distribution
Statistical inference: Probability and Distribution
Psych stats Probability and Probability Distribution

What's hot (20)

DOCX
Probability distribution
PDF
Different types of distributions
PPT
Stat lesson 5.1 probability distributions
PPTX
Basic Probability Distribution
PPTX
RSS Hypothessis testing
PPT
Chapter 06
PDF
Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Profess...
PPT
4 1 probability and discrete probability distributions
PPT
Binomial distribution good
PPTX
Binomial and Poission Probablity distribution
PDF
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
PPT
Chapter 11
PPTX
Probability distribution in R
PPTX
Discreet and continuous probability
PPTX
Probability distributions
PPTX
Probability distribution
PDF
7. binomial distribution
PPT
Chapter 7 Powerpoint
PPTX
RSS probability theory
PPT
Chapter 05
Probability distribution
Different types of distributions
Stat lesson 5.1 probability distributions
Basic Probability Distribution
RSS Hypothessis testing
Chapter 06
Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Profess...
4 1 probability and discrete probability distributions
Binomial distribution good
Binomial and Poission Probablity distribution
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Chapter 11
Probability distribution in R
Discreet and continuous probability
Probability distributions
Probability distribution
7. binomial distribution
Chapter 7 Powerpoint
RSS probability theory
Chapter 05
Ad

Similar to Chapter2 slides-part 2-harish complete (20)

DOCX
PAGE 1 Chapter 5 Normal Probability Distributions .docx
PDF
REPORT MATH.pdf
PPT
Normal Distribution
PPT
Descriptive stat
PPTX
Mas206 PROBABILITY theory_AND_STATS.pptx
PPTX
Module-4_Normal-Distributiohhhhhhjn.pptx
PDF
1.0 Descriptive statistics.pdf
PPTX
L5.pptx jsnushebdiodjenenehdydyhdhieoskdjdn
PPTX
m2_2_variation_z_scores.pptx
PPTX
bvcbfbgbgbbgfdfhfdgfhgfnghgjgfhghjgflll.pptx
PPTX
bvcbfbgbgbbgfdfhfdgfhgfnghgjgfhghjgflll.pptx
PPT
Probability concepts and the normal distribution
PPTX
Normal Distribution slides(1).pptx
PDF
Year 12 Maths A Textbook - Chapter 10
PPTX
MODULE 4 in Statistics and Probability.pptx
PPT
ERM-4b-finalERM-4b-finaERM-4b-finaERM-4b-fina.ppt
PPTX
Lecture 2 Descriptive statistics.pptx
PPT
lecture6.ppt
PPT
continuous probability distributions.ppt
PPTX
St201 d normal distributions
PAGE 1 Chapter 5 Normal Probability Distributions .docx
REPORT MATH.pdf
Normal Distribution
Descriptive stat
Mas206 PROBABILITY theory_AND_STATS.pptx
Module-4_Normal-Distributiohhhhhhjn.pptx
1.0 Descriptive statistics.pdf
L5.pptx jsnushebdiodjenenehdydyhdhieoskdjdn
m2_2_variation_z_scores.pptx
bvcbfbgbgbbgfdfhfdgfhgfnghgjgfhghjgflll.pptx
bvcbfbgbgbbgfdfhfdgfhgfnghgjgfhghjgflll.pptx
Probability concepts and the normal distribution
Normal Distribution slides(1).pptx
Year 12 Maths A Textbook - Chapter 10
MODULE 4 in Statistics and Probability.pptx
ERM-4b-finalERM-4b-finaERM-4b-finaERM-4b-fina.ppt
Lecture 2 Descriptive statistics.pptx
lecture6.ppt
continuous probability distributions.ppt
St201 d normal distributions
Ad

More from EasyStudy3 (20)

PDF
Week 7
PDF
Week 6
PDF
2. polynomial interpolation
PDF
PDF
Chapter 5
PDF
Lec#4
PDF
Chapter 12 vectors and the geometry of space merged
PDF
Week 5
PDF
Chpater 6
PDF
Chapter 5
PDF
Lec#3
PDF
Chapter 16 2
PDF
Chapter 5 gen chem
PDF
Topic 4 gen chem guobi
PDF
Gen chem topic 3 guobi
PDF
Chapter 2
PDF
Gen chem topic 1 guobi
PDF
Chapter1 f19 bb(1)
PDF
Chapter 16 1
Week 7
Week 6
2. polynomial interpolation
Chapter 5
Lec#4
Chapter 12 vectors and the geometry of space merged
Week 5
Chpater 6
Chapter 5
Lec#3
Chapter 16 2
Chapter 5 gen chem
Topic 4 gen chem guobi
Gen chem topic 3 guobi
Chapter 2
Gen chem topic 1 guobi
Chapter1 f19 bb(1)
Chapter 16 1

Recently uploaded (20)

PPTX
master seminar digital applications in india
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Lesson notes of climatology university.
PDF
Complications of Minimal Access Surgery at WLH
PDF
Pre independence Education in Inndia.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Institutional Correction lecture only . . .
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
master seminar digital applications in india
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Computing-Curriculum for Schools in Ghana
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India
Lesson notes of climatology university.
Complications of Minimal Access Surgery at WLH
Pre independence Education in Inndia.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Microbial disease of the cardiovascular and lymphatic systems
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...

Chapter2 slides-part 2-harish complete

  • 1. STT 200 STATISTICAL METHODS Chapter 2: Foundation for Inference – Part 2
  • 2. 2.5 The Central Limit Theorem ■ Here are a few of the null distributions we’ve looked at throughout this chapter. ■ What patterns can you identify regarding their shape? The null distribution for testing independence of precipitation vs. type of day of the week. The null distribution for testing whether a new sales pitch is better than a current one. 2
  • 3. Government shutdown The null distribution for testing whether network political inclination is independent of reported polling results about the government shutdown. 3
  • 4. The Central Limit Theorem (CLT) - 1 ■ In Chapter 1, we learned distributions are often left- or right- skewed. ■ However, these null distributions are symmetric and relatively bell-shaped. ■ Do you think this is a coincidence? ■ TOP HAT ■ This is NOT a coincidence! The shape of these distributions is, in fact, mathematically guaranteed by the Central Limit Theorem. ■ It says that under certain conditions, in the limit, certain sample statistics are approximately normally distributed! ■ Average behavior is normal. ■ Prof. Shlomo Levental 4
  • 5. The Central Limit Theorem (CLT) - 2 ■ If we look at a proportion (or difference in proportions) and the scenario meets certain conditions, ■ then the sample proportion (or difference in proportions) will appear to follow a bell-shaped curve called a normal distribution. 𝑓 𝑥 = 1 𝜎 2𝜋 𝑒 − 1 2 𝑥−𝜇 𝜎 2 Center is Mean: 𝜇 Spread is SD: 𝜎 The equation of the curve is CENTER/SPREAD 5
  • 6. Conditions for the CLT In order for the CLT to apply, two conditions must be true: 1. Observations in the sample(s) are independent. ■ Independence is often guaranteed in an observational study by taking a random sample from a population. ■ It can also be guaranteed in the context of a controlled experiment if we randomly divide individuals into treatment groups. 2. The sample size is sufficiently large. ■ In order for the null distribution to take on the shape of a normal curve, we must have gathered a sufficiently large sample of data, regardless of whether it is an observational study or controlled experiment. ■ Just how large is large enough? ■ That differs from one context to the next, and we’ll provide guidelines as we encounter them through the rest of the semester. 6
  • 7. 2.6 The Normal Distribution Here are three different normal curves. What do they share in common? Normal curves always have the following five characteristics: 1. Unimodal (single peak) 2. Symmetic 3. Bell Shaped (or Mound Shaped) 4. Center is the mean and Spread is the standard deviation 5. Area under any normal curve is probability and Total area/probability is 1 7
  • 8. Shape of the Normal Curve ■ Despite these common characteristics, normal distributions can look quite different, as you can see above. ■ Specifically, the normal distribution can be adjusted using two parameters, the mean and the standard deviation. ■ Change Center: Changing the mean of a normal curve shifts the curve to the left or right. ■ Change Spread: Changing the standard deviation of a normal curve stretches or constricts the curve around the mean. 8
  • 9. Labelling the Normal Curve ■ If a normal curve has mean 𝜇 and standard deviation 𝜎, statisticians will write the distribution as the 𝑵 𝝁, 𝝈 distribution. ■ The three distributions above can be written (from left to right) as the 𝑁(0,1), the 𝑁(1,1.5) and 𝑁(−2, 0.7) distributions. 9
  • 10. Standard Normal Distribution ■ Because the mean and standard deviation describe a normal distribution exactly, they are called the distribution’s parameters. ■ MATRIX Movie: NEO ■ SPECIAL NORMAL DISTRIBUTION: When a normal curve has mean 𝜇 = 0 and standard deviation 𝜎 = 1, we label the curve the Standard normal or N(0,1) or Z curve. 10
  • 11. Using Calculator to find Probabilities, Areas and Percentiles on Page 81 To find a probability if a data value is known: 2nd Vars – “normalcdf” – enter “lower limit, upper limit, mean, sd” Example: 𝑃(900 ≤ 𝑋 ≤ 1200) where 𝜇 = 1060 𝑎𝑛𝑑 𝜎 = 195 Enter 2nd Vars – normalcdf (900, 1200, 1060, 195) enter. Answer 0.557644 To find data values when given an area (or percentage): 2nd Vars – “invnorm” – enter (enter area to the left as decimal, mean, sd) Example: Find the score or data value corresponding to the 80th percentile where 𝜇 = 1060 𝑎𝑛𝑑 𝜎 = 195 2nd Vars – “invnorm” – (0.80, 1060, 195) enter Answer: 1224 Step-by-step instructions with examples at https://msu.edu/~fairbour/MSU/CalculatorHelps/Normal CurveCalcInstructions.pdf Normal tables in olden days!!! 11
  • 12. Example: SAT scores ■ Cumulative SAT scores are approximated well by a normal model, 𝑁(1060, 195). ■ Provide a sketch of the approximating normal curve. 12
  • 13. Applying Z Scores: SAT scores 1 1. Approximately what proportion of test takers score between 900 and 1200 on the SAT? Given: Data Values To find: Proportion/Probability/Area Sketch and label center data values Press 2nd Press Vars Choose Normalcdf Lower: 900 Upper:1200 Mean: 1060 SD: 195 Answer: 0.5576 13
  • 14. Finding probabilities – known data values Step 1: Sketch a picture of the area you’re trying to find. Step 2: Compute the area using a calculator / computer software. ■ To find a probability if a data value is known: – 2ND VARS – “normalcdf” – enter “lower limit, upper limit, mean, sd” ■ Example: 𝑃(900 ≤ 𝑋 ≤ 1200) – Press 2nd Vars – normalcdf (900, 1200, 1060, 195) Enter. – Answer 0.557644 14
  • 15. Applying Z Scores: SAT scores 2 2. A randomly-selected SAT test-taker is about to sit for the test. Nothing is known about her aptitude. What is the probability that she scores at least 1300 on her SATs? Given: Data Values To find: Probability/Area 2nd Vars Normalcdf Lower: 1300 Upper: 𝟏𝟎 𝟏𝟎 Mean: 1060 SD: 195 Answer: 0.1092 15
  • 16. Applying Z Scores: SAT scores 3 c. Another SAT test-taker is taking the SAT for a second time after earning a 1100 on his first attempt. What was the percentile of his first score? DO THE FOLLOWING NOW!!! ■ Sketch the normal curve ■ Label Center ■ Mark the data values ■ Shade the required area ■ Find the probability using GC ■ Think: What is the lower limit? ■ TOP HAT 16
  • 17. Applying Z Scores: SAT scores 3 c. Another SAT test-taker is taking the SAT for a second time after earning a 1100 on his first attempt. What was the percentile of his first score? Answer: ■ Normalcdf 𝑳𝒐𝒘𝒆𝒓: −𝟏𝟎 𝟏𝟎 Upper: 1100 Mean: 1060 SD: 195 Answer: 58th Percentile 17
  • 18. Applying Z Scores: SAT scores 4 d. What is the SAT score of someone who scores at the 80th percentile? Given: Percentile/Probability To find: Data value Should we use Normalcdf??? No, USE 2nd Vars invNorm Must enter left side area!!! Area: 0.80 Mean: 1060 SD: 195 Answer: 1224 ??? 18
  • 19. Finding data values with known area Step 1: Sketch a picture with the data value you’re trying to find. Step 2: Compute the data value using a calculator / computer software. ■ To find data values when given an area (or percentage): – 2ND VARS– “invnorm” –(enter area to the left as a decimal, mean, sd) ■ Example: Find data value corresponding to 80th percentile. – 2ND VARS – invNorm(0.80, 1060, 195) enter – Answer: 1224.116 19
  • 20. Standardizing with Z scores: Formula! ■ Often, it is valuable to quantify how far an observation falls from its mean or expected value. ■ Recall that the SD gives us the typical average distance an observation falls from its mean or expected value ■ Standardized score or z-score: 𝑧 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑥 − 𝜇 𝜎 ■ We can interpret a z-score as quantifying the number of standard deviations an observation falls from its mean or expected value. ■ In using this formula, some Normal Variable X is converted to Standard Normal Variable Z ■ There is an interesting connection between the area under the normal curves of X and Z. 20
  • 21. There are two major tests of readiness for college, the ACT and the SAT. ■ ACT scores are reported on a scale from 1 to 36. The distribution of ACT scores for more than 1 million students in a recent high school graduating class was mound-shaped and symmetric with – mean = 20.8 and sd = 4.8. ■ SAT scores are reported on a scale from 400 to 1600. The SAT scores for 1.4 million students in the same graduating class was mound-shaped and symmetric with – mean = 1060 and sd = 195. ■ Tonya scores 1320 on the SAT. Jessie scores 28 on the ACT. ■ Both seemed to have done well. ■ But, who did better in their respective test, can you tell? ACT vs. SAT 21
  • 22. Who did better on college prep test? ■ ACT: average = 20.8, SD = 4.8 ■ SAT: average = 1060, SD = 195 – Tonya scores 1320 on the SAT. Jessie scores 28 on the ACT. Assuming that both tests measure the same thing, who has the higher score (relatively)? ■ Question: Who did better? ■ Who is further away from the mean? ■ Calculate the z-scores: ■ TOP HAT  − = x z 22
  • 23. Interpreting z-scores Extra question ■ Police department salaries in San Francisco have a mean of $90,702 and an sd of $45,321. ■ A chief of police’s salary has a z-score of z = 4.84. Interpret this z-score. – A. The chief of police makes 4.84 times as much as the average employee salary. – B. The chief of police’s salary is nearly 5 standard deviations above the average employee salary. – C. Only 4.84% of employees make more than the chief of police. ■ TOP HAT 23
  • 24. Z-score formula preserves … ■ Use normalcdf to calculate the approximate percentage of students who scored better than Jessie on the ACT. ■ Use normalcdf to calculate the area above 𝐳 = 𝟏. 𝟓 using the 𝑁(0, 1) distribution. Recall, Jessie’s z- score is 1.5. PAGE 82: Add this before finding probabilities with z-scores. ■ What do you notice about the probabilities? ■ TOP HAT 24
  • 25. Z-score formula preserves … P(Z>1.5) =0.0668P(ACT>28) =0.0668 Recall, that Jessie ACT score was 28 and her z-score was 1.5. 25
  • 26. Z-score formula preserves probability ■ Use normalcdf to calculate the approximate percentage of students who scored better than Jessie on the ACT. ■ Use normalcdf to calculate the area above 𝐳 = 𝟏. 𝟓 using the 𝑁(0, 1) distribution. Recall, Jessie’s z-score is 1.5. PAGE 82: Add this before finding probabilities with z-scores. ■ What do you notice about the probabilities? ■ The probabilities are the same. ■ Why? ■ Z-score formula preserves area under normal curves (probability). P(ACT < a) = P( Z < a*) where a* is the z-score of a. 26
  • 27. Finding probabilities for z scores (1) 1. Find P(−1 ≤ 𝑍 ≤ 1) In words, find the probability that the standard normal variable takes values within one standard deviation of the mean. Hint: Remember Matrix? Neo! Use normalcdf with 𝑵(𝟎, 𝟏) Lower: -1 Upper: 1 Mean: 0 SD: 1 Answer: 0.6827 27
  • 28. Finding probabilities for z scores (2) 2. What is the probability that a standard normal variable Z is within 2 standard deviations of mean? That is, find P(-2 ≤ Z ≤ 2). ■ TOP HAT 28
  • 29. Finding probabilities for z-scores (3) 2. What is the probability that a standard normal variable Z is within 2 standard deviations of mean? That is, find P(-2 ≤ Z ≤ 2). ■ TOP HAT ■ Answer: normalcdf (-2, 2, 0, 1) = 0.9545 3. Find P(-3 ≤ Z ≤ 3). Answer: normalcdf (-3, 3, 0, 1) = 0.9973 29
  • 30. Normal curves and the empirical rule For data that follows a normal distribution, ■ Approximately 68% of the data will have a z-score between -1 and 1. ■ Approximately 95% of the data will have a z-score between -2 and 2. ■ Approximately 99.7% of the data will have a z-score between -3 and 3. ■ So, in general: ■ 68%, 95% and 99.7% lie within one, two and three SDs of the mean. 30
  • 31. Finding z-scores for probabilities 4. What z-scores provide the bounds for the middle 50% of the standard normal distribution? ■ THINK: What do we want to find? Two bounds so middle area is 0.5. ■ Let us call this bound as a. ■ Then this is –a by symmetry. So, just need to find a. ■ THINK: Which function should we use in GC? TOP HAT ■ THINK: Middle Area is 0.5 then what is the left side area for the negative bound -a? ■ THINK: Middle Area is 0.5 then what is the left side area for the positive bound a? TOP HAT Z~N(0,1) What is this z-score? 31
  • 32. Finding z-scores for probabilities 4. What z-scores provide the bounds for the middle 50% of the standard normal distribution? On the Z curve we have middle 50% Remaining area is 50%, so 25% on each side! 32
  • 33. Finding z-scores for probabilities 4. What z-scores provide the bounds for the middle 50% of the standard normal distribution? Left side area for a is 0.75 and not 75, must be a decimal between 0 and 1. Answer: invNorm(0.75, 0, 1) or invNorm(0.25, 0, 1) -0.674 and 0.674 are the bounds for middle 50% 33
  • 34. Finding z-scores for probabilities 5. What z-scores provide the bounds for the middle 95% of the standard normal distribution? ■ THINK: Which function? ■ THINK: What is the left side area? ■ TOP HAT 34
  • 35. Finding z-scores for probabilities 5. What z-scores provide the bounds for the middle 95% of the standard normal distribution? ■ THINK: Which function? ■ THINK: What is the left side area? ■ TOP HAT ■ Answer: invNorm (0.975, 0, 1) = 1.96 ■ So, -1.96 and 1.96 are the bounds for the middle 95% of the standard normal distribution. 35
  • 36. Example: IQ scores (1) ■ IQ test scores are formulated to be normally distributed, that is, they follow the shape of a normal curve. Suppose we have the following sample of 30 IQ scores: ■ Let us verify whether these scores are from a Normal model: Mean: 97.1 and SD: 11.7. 1. How many of these scores are within 1 standard deviation of the mean? HINT: Count them! ■ First find the range of IQ scores that are within 1 SD of the mean. ■ Mean ± 1 x SD = 97.1 ± 11.7. So, between 85.4 and 108.8 65 80 81 83 85 89 90 91 91 92 94 95 97 97 97 97 99 100 101 101 101 102 104 105 106 107 109 112 120 121 36
  • 37. Example: IQ scores (1) ■ IQ test scores are formulated to be normally distributed, that is, they follow the shape of a normal curve. Suppose we have the following sample of 30 IQ scores: 1. How many of these scores are within 1 standard deviation of the mean? HINT: Count them! ■ We count 21 IQ scores between 85.4 and 108.8 ■ So, 21/30 = 70% of IQ scores lie within 1 SD of the mean 65 80 81 83 85 89 90 91 91 92 94 95 97 97 97 97 99 100 101 101 101 102 104 105 106 107 109 112 120 121 37
  • 38. Example: IQ scores (2) ■ IQ test scores are known to be normally distributed, that is, they follow the shape of a normal curve. Suppose we have the following sample of 30 IQ scores: ■ How many of these scores are within 2 standard deviations of the mean? ■ Mean ± 2 x SD = 97.1 ± 2 x 11.7 ■ That is, between 73.7 and 120.5 ■ We count 28 of the scores, so 28/30 = 93.33% 65 80 81 83 85 89 90 91 91 92 94 95 97 97 97 97 99 100 101 101 101 102 104 105 106 107 109 112 120 121 38
  • 39. Example: IQ scores (3) ■ IQ test scores are known to be normally distributed, that is, they follow the shape of a normal curve. Suppose we have the following sample of 30 IQ scores: ■ How many of these scores are within 3 standard deviations of the mean? ■ Mean ± 3 x SD = 97.1 ± 3 x 11.7 ■ All of them. So, 30/30 = 100%. ■ Do you think the data follows the empirical rule well? ■ Yes, the data seems to follow the empirical rule. So, we can believe that this data is from a Normal population. 65 80 81 83 85 89 90 91 91 92 94 95 97 97 97 97 99 100 101 101 101 102 104 105 106 107 109 112 120 121 39
  • 40. Evaluating the normal approximation ■ Many data sets can be well-approximated by the normal distribution. ■ We saw earlier that SAT scores, ACT scores and IQ scores are well-approximated by the normal model. ■ While these models are helpful and convenient, we must remember that they are only an approximation. ■ Often, it is important to evaluate just how good (or bad) of an approximation the normal model is when applied to a scenario. ■ There are two simple visual ways to assess whether a normal approximation is appropriate: 1. Histogram 2. QQ-plot (normal probability plot) 40
  • 41. Histograms and QQ plots ■ Here are the histogram and QQ plot for the IQ score example: ■ A QQ-plot is short for quantile-quantile plot ■ (quantile = percentile rank) ■ Sample’s quantiles/percentile ranks on the vertical axis and Z’s quantiles/percentile ranks on the horizontal axis ■ For example, sample Q1 will be matched with Z’s Q1, etc. ■ In a QQ-plot, the closer the dots are to a perfect straight line, the more confident we can be that our data follow the normal model. 41
  • 42. Histograms and QQ-plots 1 Consider the following histograms of data sets along with the QQ-plots: 42
  • 44. Histograms and QQ-plots 3 For which of the data sets would you recommend using a normal curve to model the distribution? TOP HAT 44
  • 45. 2.7 Applying the normal model (1) Standard Error ■ Sample statistics or Point estimates vary from sample to sample, and it is often valuable to quantify that variability with what is called the standard error (SE). ■ The standard error of a point estimate is approximately equal to the standard deviation associated with the estimate. ■ For example, if we look at the normal approximation of the distribution of sample proportions, then the standard error will be used as the standard deviation of sample proportions, 𝑺. 𝑬ෝ𝒑 45
  • 46. Applying the normal model (2) ■ For instance, if we had a sample statistic/point estimate with 𝑆𝐸 = 4.2 units, that would mean that this point estimate, over many repeated samples, would be approximately 4.2 units away from the parameter it estimates, on average. ■ The way we compute the 𝑆𝐸 of a point estimate varies for different types of point estimates. ■ We will cover these computations in more detail in later chapters. For now, let’s return to some familiar research scenarios. 46
  • 47. Example: “Boomeranging” (1) ■ We are on PAGE 86 ■ Recall the research study that investigated whether the rate at which men ‘boomeranged’ back to their parents’ homes as adults had changed from its 1997 level of 13%. ■ The researchers took a random sample of 150 adult men and found that 25 of them had left their parental home and then returned. The hypotheses that were tested were: ■ 𝐻0: There has been no change in the rate of ‘boomeranging’ among young men. The percentage is still 13% and any difference in the sample is due to chance. ■ 𝐻 𝑎: There has been a change in the rate of ‘boomeranging’ for young men. 47
  • 49. Example: “Boomeranging” (2) 1. Use the information above to report a point estimate for the current rate of ‘boomeranging’ among young men, the approximate p-value from this randomization test, and then evaluate the evidence with regards to the associated hypotheses. ■ Sample proportion or Point estimate = 25/150 = 0.1667 ■ The approximate p-value from two tails is 0.2265 ■ There is very little evidence that the null model is not a good fit for the observed results. 2. Now try to reproduce this p-value using a normal distribution approach. 49
  • 50. Example: “Boomeranging” (3) 2. Create a quick sketch in the space below of a normal distribution centered at 0.13 with a standard error of 0.0275. 50
  • 51. Example: “Boomeranging” (4) Interpret the standard error of 0.0275. Hint for Exam2: ■ Here our sample proportion of boomerangers 𝐡𝐚𝐬 𝒂 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓, 𝑺𝑬 = 𝟎. 𝟎𝟐𝟕𝟓. ■ INTERPRETATION: This means that point estimates/sample proportions, on average, over many repeated samples, would be approximately 0.0275 units away from the parameter/population proportion of boomerangers of 0.13. 51
  • 52. Example: “Boomeranging” (5) 2. Calculate the Z score using the observed ‘boomeranging’ rate of 25 150 = 0.1667, along with the mean and standard error of the normal model. [Notice how we use the standard error of the statistic as the standard deviation to find the z score.] 𝑧 = 0.1667 − 0.13 0.0275 = 1.3345  − = x z 52
  • 53. Example: “Boomeranging” (6) 3. Identify the p-value corresponding to this z score. ■ Recall, area under Z curve is the same as corresponding area under the normal curve for sample proportions ■ normalcdf(1.3345, 𝟏𝟎 𝟏𝟎 , 0, 1) ■ Right tail area = 0.091. Or could do, normalcdf(0.1667, 𝟏𝟎 𝟏𝟎, 𝟎. 𝟏𝟑, 𝟎. 𝟎𝟐𝟕𝟓) This is not recommended, as later on, we have different formulas for S.E. and it gets confusing. 1.3345 53
  • 54. Example: “Boomeranging” (7) Recall: Observed boomeranging rate is 0.1667 𝑧 = 1.3345 ■ What is the definition of p-value? ■ Under the null model, the probability of observing a Boomeranging rate of 0.1667 and beyond, that is, as far away as 0.1667 is from 0.13 or worse. ■ TOP HAT: To find p-value should we multiply the right tail area by 2? How can we tell whether to multiply by 2 or not??? 54
  • 55. Example: “Boomeranging” (8) 3. Identify the p-value corresponding to this z score. How does it compare to the p-value from the randomization simulation? Would we make the same evaluation regarding the null model? ■ In terms of z-scores, it is the probability of observing a z-score of 1.3345 and beyond, that is, z-scores less than -1.3345 and z-scores greater than 1.3345. ■ We have a two tailed test, so by symmetry, p-value = 2 x 0.091 = 0.182 ■ For randomization based test the approximate p-value was 0.2265. ■ Here, we have a slightly smaller p-value but the evaluation is still the same! ■ There is LITTLE evidence that the null model IS NOT a good fit for the observed results. 1.3345-1.3345 55
  • 56. Example: Web Design (1) ■ Recall the observational study undertaken by an online art gallery to see whether investing in a website redesign would increase the percentage of premium accounts from its current rate of 25%. ■ The gallery surveyed 500 users and found that 150 of them say they would continue or purchase a premium account if the new features were included. The hypotheses tested were: ■ 𝐻0: The percentage of premium accounts will not change after including additional features on the website; it will remain at 25%. Any difference from this rate is due to chance involved in the sampling process. ■ 𝐻 𝑎: The percentage of premium accounts will change after including additional features on the website. 56
  • 57. Example: Web Design (2) From the randomization test, we could see that the p-value was small and we were inclined to think that the percentage of premium accounts might increase. 1. Now try to reproduce this p-value using a normal distribution approach. Create a quick sketch in the space below of a normal distribution centered at 0.25 with a standard error of 0.019. 57
  • 58. Example: Web Design (3) 2. Calculate a Z score using the observed ‘boomeranging’ rate of 25 150 = 0.3, along with the mean and standard error of the normal model. [Notice how we use the standard error of the statistic as the standard deviation for the z score.] ■ TOP HAT: To find p-value should we multiply the right tail area by 2? How can we tell whether to multiply by 2 or not???  − = x z 𝑧 = 0.3 − 0.25 0.019 = 2.6316 58
  • 59. Example: Web Design (4) 3. Identify the p-value corresponding to this z score. How does it compare to the p-value from the randomization simulation? Would we make the same evaluation regarding the null hypothesis? ■ Normalcdf (2.6316, 𝟏𝟎 𝟏𝟎, 0, 1) ■ P-value = 0.0042 (one-tail test) ■ From Section 2.4, for the randomization based test, we had a p-value less than 0.001, that is, extremely strong evidence that null model is not a good fit for our data. ■ Whereas, here we only have very strong evidence that null model is not a good fit. ■ In both cases, we will recommend redesign of website to the owners. 59
  • 60. 2.8 Confidence Intervals We are on PAGE 88 ■ A sample statistic provides a single plausible value for a population parameter using collected sample data. ■ That is why, we call this sample statistic a point estimate. ■ Sometimes, it is more useful to provide a plausible range of values for that parameter. ■ Statisticians call this plausible range of values a confidence interval. ■ Suppose we have a large bin filled with small green and white balls and we want to know the proportion of white balls in the bin/box/population. 60
  • 61. Example: Estimating the proportion Question: What is the proportion of white balls in the bin? In this scenario, we could count all the balls and find the population proportion but that would take too much time. ■ Instead, let’s take a random sample of, say, 100 balls. ■ If our sample of 100 balls has 59 white and 41 green balls, what is the sample proportion of white balls? ■ Sample proportion of white balls: Ƹ𝑝 𝑤ℎ𝑖𝑡𝑒 = 59 100 ■ We call this sample proportion, point estimate ■ Based on this sample point estimate, give a plausible range of values for the population proportion. ■ How confident are you that your answer is correct? TOP HAT 61
  • 62. Sample estimates vary ■ We know based on experience that sample estimates vary, and that under certain conditions they will follow a normal model. ■ The graph shows 3000 sample proportions for this scenario. ■ Our sample proportion of Ƹ𝑝 = 0.59 is somewhere in this distribution, and likely near the center of the distribution but not at the center. 62
  • 63. Standard Error ■ Say, we can also estimate the standard error to be 0.048. ■ Since we’re talking about a sampling model, we call this standard deviation the standard error. ■ We will learn how to calculate the standard error in chapter 3. 63
  • 64. Range of plausible values ■ Sample proportion is Ƹ𝑝 = 0.59 ■ Standard error is 0.048 ■ Use what you know about the normal distribution to construct a new range of plausible values for the population proportion of white balls.. ■ Hint: Use empirical rule! ■ How confident are you that your answer is correct? TOP HAT ■ App for confidence intervals https://shiny.stt.msu.edu/fairbour/Confidence/ ■ How likely is it that the interval includes the true population proportion? TOP HAT ■ How many out of 100 intervals do you expect to contain the true population proportion? TOP HAT 64
  • 65. The gist of confidence intervals (1) 1. The value of the sample estimate will vary from one sample to the next. ■ The values vary around the population parameter. 2. The standard error of the sample estimate provides an idea of how far away it would tend to vary from the parameter value (on average). 3. The general format for a confidence interval is given by: ■ Sample (point) estimate ± (a few) standard errors 4. The “few” or number of standard errors we go out each way from the sample estimate will depend on what coverage rate (i.e., how confident) we want to be. 65
  • 67. The gist of confidence intervals (2) 5. The “how confident” we want to be is referred to as the confidence level. ■ This level reflects how confident we are in the procedure. ■ The confidence level is the percentage of the time we expect the procedure to produce an interval that contains the population parameter. ■ Most of the intervals that are made will contain the truth about the population, but occasionally an interval will be produced that does not contain the true parameter value. ■ Each interval either contains the population parameter or it doesn’t. 67
  • 68. How many standard errors? We are at the bottom of PAGE 89 ■ This depends on the confidence level. ■ Given the standard normal distribution, what are the boundaries for the middle 95%? ■ Use invNorm(0.975, 0, 1), see page 83 on Lecture guide. ■ -1.96 and 1.96 are the middle 95% boundaries. ■ If we set “a few” to be z = 1.96, then we can expect that 95% of the sample proportions will be in the interval population parameter ± 1.96 standard errors ■ This is a fact. ■ But, recall we are estimating the population parameter, meaning we do not know the population parameter. ■ We know our point estimate/sample proportion!!! 68
  • 69. Procedure and confidence 69 Let us look at this the other way round: point estimate ± 1.96 standard errors
  • 70. Constructing Confidence Intervals Calculate the interval point estimate ± 1.96 standard errors for each of the following: Standard error is 0.048 ■ Sample 1 has point estimate, Ƹ𝑝 = 0.59 ■ Answer: (0.49592, 0.68408) ■ Sample 2 has Ƹ𝑝 = 0.71 ■ Answer: (0.61592, 0.80408) ■ Sample 3 has Ƹ𝑝 = 0.75 ■ Answer: (0.65592, 0.84408) 70
  • 71. Which intervals include the parameter? ■ In this example, we actually know the true proportion is 𝑝 = 0.66. ■ Which of the intervals you calculated included this value? 71 (0.49592 0.68408) (0.61592, 0.80408) (0.65592, 0.84408) Population proportion=0.66
  • 72. Key Idea ■ Because 95% of sample proportions are within 1.96 standard error of the population parameter, approximately 95% of the intervals we create using this procedure will include the parameter. ■ More practice with this idea at https://shiny.stt.msu.edu/fairbour/Confidence/ 72
  • 73. Example 1: Pass or Fail? ■ Earlier, we encountered a research scenario where an engineering instructor examined whether students with an urban/suburban background are more likely to pass the course than rural/small-town students. The table below shows the results of this study. Student Background Pass Fail Total Urban/Suburban 52 13 65 Rural/Small-town 30 25 55 Total 82 38 120 73
  • 74. Example 1: Pass or Fail? Interval ■ The point estimate suggests that students from an urban/suburban background are more likely to pass the course: ■ Ƹ𝑝 𝑈𝑟𝑏𝑎𝑛 − Ƹ𝑝 𝑅𝑢𝑟𝑎𝑙 = 52 65 − 30 55 = 0.255 . ■ The standard error of this estimate is 𝑆𝐸 = 0.0852. ■ Construct a 95% confidence interval for the true difference in the proportions of urban and rural students that pass the course. point estimate ± 1.96 standard errors = 0.255 ± 1.96 x 0.0852 ■ The 95% confidence interval for the true difference in the proportions of urban and rural students that pass the course is (0.088008, 0.421992) ■ 0.088008 is called the lower bound of the confidence interval. ■ 0.421992 is called the upper bound of the confidence interval. ■ The plausible values for the true difference are between 8.8% and 42.2%. 74
  • 75. Example 2: Web Design Interval ■ Recall the observational study undertaken by an online art gallery to see whether investing in a website redesign would increase the percentage of premium accounts from its current rate of 25%. The gallerists surveyed 500 users and found that 150 of them say they would purchase a premium account if the new features were included. ■ Earlier, we used the point estimate Ƹ𝑝 = 0.3 and its standard error 𝑆𝐸 = 0.0205 to conduct a hypothesis test based on the normal distribution. ■ Use these same values to create a 95% confidence interval for the proportion of Premium account users. TOP HAT 75
  • 76. Example 2: Web Design Interval Interpretation ■ Answer: The 95% CI for the proportion of Premium users is (0.25982, 0.34018) ■ Notice that the value 0.25 does not fall within the 95% confidence interval! ■ We can interpret this to mean the confidence interval does not consider it to be a reasonable value for the true percentage of premium accounts – this is consistent with our evaluation of the evidence for the hypothesis test we conducted earlier. 76
  • 77. Interpretation ■ The phrase confidence level is used to describe the likeliness or chance that a yet-to-be constructed interval will actually contain the true population value. ■ However, we have to be careful about how to interpret this level of confidence if we have already computed our interval of values. ■ The population parameter is not a random quantity, it does not vary - once we have “looked” (computed) the actual interval, we cannot talk about probability or chance for this particular interval anymore. ■ Unlike in the movie “Harry Potter”! ■ https://youtu.be/81scFUYQGbU?t=78 ■ The 95% confidence level applies to the procedure, not to an individual interval; it applies “before you look” and not “after you look” at your data and compute your observed interval of values. 77
  • 78. Changing the confidence level - 1 ■ Suppose we want to create a range of plausible values for a parameter that will have more than 95% confidence [i.e., create the interval using a process that will capture the parameter more than 95% of the time]. ■ Let’s return to our original formula for a 95% confidence interval: 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸 ■ Notice this interval has three components: the point estimate, the standard error of that estimate, and a multiplier of 1.96. ■ invNorm(0.975, 0, 1) = 1.96 ■ Recall that we chose this multiplier of 1.96 earlier after observing that 95% of observations of a normally-distributed variable fall within 1.96 standard deviations of the mean. 78
  • 79. Changing the confidence level - 2 ■ invNorm(0.995, 0, 1) = 2.576 ■ invNorm(0.95, 0, 1) = 1.645 ■ By extension, we could observe that 99% of observations fall within 2.576 standard deviations of the mean, and that only 90% of observations fall within 1.645 standard deviations of the mean. ■ If the point estimate of a parameter follows a normal model with standard error 𝑆𝐸, then a confidence interval for that population parameter is: 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑧∗ ∗ 𝑆𝐸 where 𝑧∗ corresponds to the level you’d like the confidence interval to have. 79
  • 80. Confidence level multipliers ■ Use the table below to jot down some confidence levels that are commonly seen in statistical studies, along with their associated multipliers. ■ After looking at this table, you’ll probably notice a key idea underlying confidence intervals: ■ If you want to be more confident in your interval of plausible values, you need to make your interval wider. Confidence Level Multiplier 𝑧∗ 90% 1.645 95% 1.96 99% 2.576 80
  • 81. The Logic of Confidence Intervals ■ Consider all possible random samples of the same large size n. ■ Each possible random sample provides a possible sample statistic value. ■ If we made a histogram of all of these possible statistics it would look like the normal distribution. ■ About 95% of the possible sample statistics will be in the interval 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ± 1.96 ∗ 𝑆𝐸 ■ and for each one of these sample statistic values, the interval 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸 will contain the population parameter. ■ Thus about 95% of the intervals 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 1.96 ∗ 𝑆𝐸 will contain the population parameter. ■ Note: The  part of the interval 1.96 ∗ 𝑆𝐸 is called the 95% margin of error. 81