Chapter2 slides-part 1-harish complete

STT 200
STATISTICAL METHODS
Chapter 2: Foundation for
Inference

Research scenario 1: Pass or fail?
■ An instructor in the college of engineering at a large
university looks at the pass rate for an introduction
to chemical engineering course.
■ Research question: Does a student’s background
affect their performance in the course?
■ Using information from one academic year, he finds
that students from urban/suburban backgrounds
pass the course at a rate 25.5% higher than
students from a rural/small-town background.
2

Research scenario 2: Government shutdown
■ Grace and Justin are two STT200 students who are
majoring in Journalism and Political Science.
■ Research question:
Grace and Justin wonder,
are the polling data reported
by the two news networks
independent of the political inclinations?
■ They closely followed the polling results during the
longest government shutdown in US history, which
ended on Friday 25th January 2019 after 35 days.
■ Two days before the end of shutdown, the New York
Post reported that 637 outs of 1062 American
registered voters said President Trump was responsible
for the shutdown while 514 out of 1008 said the same
according to FOX News.
3

Research scenario 3: Sales pitches
■ A telephone sales solicitor has been given a new
sales pitch to compare against a current sales pitch.
■ Research question: Is the new
approach better, or could any
difference in requests be due to chance?
■ She randomly alternated between them
during a day of calls.
■ Using the new approach,
21 of 100 calls led to requests for the
mailing of additional product information.
■ For the current approach in the other 100 calls, only
14 led to requests for the product information mailing.
4

Research scenario 4: Pollution and the
Weather
■ A meteorologist is concerned that
pollution associated with heavy
traffic in the local metropolitan area could be
influencing weather patterns.
■ Research question: Does the day of the week affect
the likelihood of precipitation?
■ How pollution influences the weekend weather from www.nature.com.
■ He examines precipitation records from a random
sample of 700 days and finds that the precipitation
rate on weekends is 6.5% different than the
associated rate for weekdays.
5

Statistical inference
■ In each case, the researchers started with a
research question, collected data and found
what looked to be an interesting difference in the
sample data.
■ They may wonder if this difference is also
observable in the population from which the
sample was drawn.
■ Using sample data to infer something about a
population is called statistical inference.
6

The Scientific Method
1. Make an observation about the world and
formulate a research question.
2. Construct a hypothesis.
3. Test the hypothesis with an experiment.
4. Analyze the data and evaluate the findings.
5. Communicate your results so others can
replicate the experiment and add to scientific
knowledge.
7

Hypothesis Testing
■ Many research questions can be expressed as two
competing claims that might be correct for a
population.
■ A hypothesis test is a statistical technique used to
evaluate competing claims using sample data.
■ The statements of these claims are called the
null hypothesis and the alternative hypothesis.
8

Null hypothesis
■ The null hypothesis is often denoted by 𝐻0, and is a
statement that:
– there is no difference,
– there is no relationship or no effect,
– nothing has changed or nothing is happening.
■ The null hypothesis is usually referred to as the
status quo.
■ It is the claim that any differences we see in sample
results compared to the status quo is due to chance,
that is, to uninteresting variation or randomness in
the sampling.
9

Alternative hypothesis
■ The alternative hypothesis is often denoted by 𝐻𝐴, and is
a statement:
– that there is a relationship,
– that there is a difference,
– that something has changed or something is
happening.
■ It is the claim that the difference in sample results
compared to the status quo is difficult to explain as
randomness and is not due to chance.
■ Usually, the researcher hopes the data will provide strong
evidence that the null model is not a good fit for the data,
and as a consequence support for the new theory in the
alternative hypothesis. 10

CAUTION:
■ Null and Alternative Hypothesis are statements
about the population of interest.
■ They are not statements about the results in the
sample.
■ For randomization based hypothesis tests, we
write them out using the categorical variables!
■ Note that the Independence Model will be our
Null Model.
11

Research scenario 1
■ Pass or fail? ~ An instructor in the college of engineering at a large
university looks at the pass rate for an introduction to chemical
engineering course. Using information from one academic year, he
finds that students from urban/suburban backgrounds pass the
course at a rate 25.5% higher than students from a rural/small-town
background.
■ Population of interest: All students at this university who take the
course.
■ Research question: Does a student’s background affect their
performance in the course?
■ H0: The variables Background and Course success are independent.
A student’s background does not affect whether or not they pass the
course. The observed difference is due to chance.
■ HA: The variables Background and Course success are not
independent. A student’s background does affect whether or not
they pass the course. The observed difference is not due to chance.
12

Research scenario 2
■ Government shutdown ~ Grace and Justin are two STT200 students who are
majoring in Journalism and Political Science. They closely followed the
polling results during the longest government shutdown in US history, which
ended on Friday 25th January 2019 after 35 days. Two days before the end
of shutdown, the New York Post reported that 637 outs of 1062 American
registered voters said President Trump was responsible for the shutdown
while 514 out of 1008 said the same according to FOX News.
■ Population of Interest: All American registered voters
■ Research Question: Does the political inclination of the news network affect
the polling results the network reports?
■ H0: The variables political inclination and network poll results are
independent. The difference in the reported proportion of poll respondents
who think Trump was responsible is due to chance (variability in sample
data).
■ HA: The variables political inclination and network poll results are not
independent. The difference we see is too large to be considered as chance
error. The political inclinations of the network may be affecting the results
that they report.
13

Research scenario 3
■ Sales pitches ~ A telephone sales solicitor has been given a
new sales pitch to compare against a current sales pitch. She
randomly alternated between them during a day of calls. Using
the new approach, 21 of 100 calls led to requests for the
mailing of additional product information. For the current
approach in the other 100 calls, only 14 led to requests for the
product information mailing.
■ Population of Interest: All potential customers for this product
■ Research Question: Does the new sales pitch increase the
rate of requests for product information?
■ H0: Sales pitches and Requests for mailing of additional
product information are independent. The observed
difference in the proportion of requests is due to chance.
■ HA: Sales pitches and Requests for mailing of additional
product information are not independent. The observed
difference in the proportion of requests is not due to chance.
14

Research scenario 4
■ A meteorologist is concerned that pollution associated with
heavy traffic in the local metropolitan area could be
influencing weather patterns. He examines precipitation
records from a random sample of 700 days and finds that the
precipitation rate on weekends is 6.5% different than the
associated rate for weekdays.
■ Population of Interest: All days
■ Research Question: Does the day of the week affect the
likelihood of precipitation?
■ H0: Weather and the day of the week are independent. The
observed difference in precipitation rates between the
weekdays and the weekends is due to chance.
■ HA: Weather and the day of the week are not independent.
The observed difference in precipitation rates between the
weekdays and the weekends is too large to be attributable to
chance variation.
15

The Law of Large Numbers
■ Below are two sets of data representing 100 coin flips
(1 = heads, 0 = tails). One data set is the result of laboriously
flipping a coin for far too long and recording each outcome. The
other one, I faked.
■ Which sequence do you think is real?
■ Set A
1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 1 1
1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 1
0 1 0 0 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0
1 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0
■ Set B
1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0
1 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1
1 0 1 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0
1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 0
■ TOPHAT
16

Coin flip questions
■ How many heads would you have expected to see in each
trial of 100 independent coin flips?
50
■ How many did we actually observe?
Set A: 55 Set B: 46
■ If we suppose that both sets are real and combine them
together, what is the percentage of heads in 200 tosses?
55 + 46
200
= 0.505 = 50.5%
17

Simulation as a tool
■ The Law of Large Numbers says that when observations
of a proportion are independent, the sample proportion
approaches the theoretical proportion as the number of
observations gets large.
■ https://shiny.stt.msu.edu/fairbour/LawAverages/
18

The plan forward
We will now use both simulation and graphical
representations of simulated results to help us:
■ Answer questions about what we would expect
randomness to look like under a null hypothesis or
independence model.
■ Evaluate whether such a model fits observed results
from collected data or experiments.
19

Constructing an Independence Model
Government shutdown ~ Let’s look more closely at the government
shutdown example. We know from the polling information that the New
York Post reported that 637 outs of 1062 American registered voters
said President Trump was responsible for the shutdown while 514 out of
1008 said the same according to FOX News.
■ If the polling data reported by two news networks are independent of
political inclinations, what would we expect the contingency table
and the mosaic plot of responses to look like?
Network Trump Others Total
New York Post
(left leaning)
1151/2070 x 1062 = 590.5 1062
FOX News
(right leaning)
1008
Total 1151 919 2070
20

Constructing an Independence Model
Contingency table and the mosaic plot of responses:
New York Post
(left leaning)
590.5 471.5 1062
FOX News
(right leaning)
560.5 447.5 1008
Total 1151 919 2070
21

Actual Survey Responses
■ Could the difference we see be attributed to the type of
variability we expect to see in any sample data, or is the
difference too large to be explained by the random
chance that is a part of gathering a random sample?
■ The actual responses to the survey are given below with
the expected counts in bold and underlined.
■ Actual data’s mosaic plot on the right
New York Post
(left leaning)
637
590.5
425
471.5
1062
FOX News
(right leaning)
514
560.5
494
447.5
1008
Total 1151 919 2070
22

Competing claims
■ Null hypothesis claim: The difference we see is
attributable to the type of variability we expect to see in
any sample data.
■ Alternative hypothesis claim: The difference is too large
to be explained by the random chance that is part of
gathering a random sample.
23

The Avengers question
■ The Avengers is a long-running, popular comic book
series and has introduced 173 superhero characters
over the years.
■ Many of these characters have died.
■ Some have even died more than once, after returning
from their earlier “death(s)” in true comic book fashion.
■ I want to investigate if being an Avenger is equally risky
for male and female superheroes.
24

Testing the claim using randomization
■ Suppose I take a random sample of 30 Avengers and get the
following results:
■ What is the difference in death rate between female and male
Avengers in this sample?
5
10
−
8
20
= −0.1
■ If I took a different sample, would I get the same result?
No, since the sample would be different, we would also expect
the difference in death rate to be different.
Gender Died at least once Hasn’t died yet Total
Female 5 5 10
Male 8 12 20
Total 13 17 30
25

Avengers Mosaic Plot showing difference in
proportions of death rates
26
5
10
−
8
20
= −0.1

Due to chance?
■ I’m now going to take the same 10 female Avengers and
20 male Avengers, and, using the idea that death is not
affecting gender, randomly select 13 to “die” and see
what the difference in death rate is “by chance”.
■ Do the simulation manually with Popsicle sticks and
Cards
■ Trial 1 difference:
■ Trial 2 difference:
■ Faster to do the simulations with computer apps!
27

Results of simulation
https://shiny.stt.msu.edu/fairbour/TwoProportion/ 28

Evaluating the results
■ What is the center of the distribution shown in the graph? Is
this consistent with the null hypothesis? Why or why not?
Center of the distribution is 0 according to the null hypothesis
which states no difference. Consistent!
■ How often did random shuffles produce a difference of at
least as large as 0.1, the observed difference in our sample?
■ Typo on notes
2102/3000 = 0.7007 or 70% of the time.
■ Does the independence model seem reasonable? Why or why
not?
Yes, since our sample result or something more extreme
happens over 70% of the time, our result seems to fit with the
independence model pretty well.
29

Example: Pass or fail? Hypotheses
Research question: Does a student’s background affect
their performance in the course?
■ H0: The variables background and course success are
independent. A student’s background does not affect
whether or not they pass the course. The observed
difference is due to chance.
■ HA: The variables background and course success are
not independent. A student’s background does affect
whether or not they pass the course. The observed
difference is not due to chance.
30

Observed sample data
■ Our instructor gathered data on a sample of 120
students. The table below summarizes this data.
Student
Background
Pass Fail Total
Urban/Suburban 52 13 65
Rural/Small-
town
30 25 55
Total 82 38 120
31

Pass or fail? – Mosaic Plot
■ We can look at the mosaic plot for this data to give us a
preview of whether the proportion of urban/suburban
students who pass is different from rural/small-town
students.
■ TOP HAT
The proportions are:
■ A. About the same
■ B. Different
■ C. I can’t tell
32

Simulating the study
■ We can use randomization to simulate what would happen if a
student’s background did not affect their course result.
■ We will first imagine how we could do this with cards (or craft
sticks) and then have a computer replicate the process many
times to produce a distribution of simulated results.
■ Row variable: In the simulation, we would write
Urban/Suburban on 65 cards and Rural/Small-town on 55
cards.
■ Column variable: Then we would thoroughly shuffle the cards
and deal 82 cards into one pile to represent those who
passed the course and put the remaining 38 cards in a pile to
represent those who did not pass the course.
■ Then we tabulate the results and determine the fraction of
urban students who passed and the fraction of rural students
who passed.
33

Simulation results
■ Trial 1:
– Urban: 45/65 = 69.23% passed the course
– Rural: 37/55 = 67.27% passed the course
– Difference = 69.23% - 67.27% = 1.96%
■ Trial 2:
– Urban: 42/65 = 64.62% passed the course
– Rural: 40/55 = 72.73% passed the course
– Difference = 64.62% - 72.73% = -8.11%
■ Are these differences as extreme as the observed
difference of 25.5?
■ No. Each of these trials gives a difference closer to 0
than 25.5%.
34

Notation, Notation, Notation!!!
■ When statisticians are referring to a sample proportion,
they often use the symbol ෝ𝒑.
■ We will use subscripts to help us distinguish which
sample proportion we are referring to.
■ In the Pass or Fail? Example we will simply use the
subscripts Urban and Rural to refer to the two categories
of the variable Student Background.
■ Denote the difference in proportions of passes by
Ƹ𝑝 𝑈𝑟𝑏𝑎𝑛 − Ƹ𝑝 𝑅𝑢𝑟𝑎𝑙
35

If the null hypothesis is true
■ If the two variables are indeed independent, we would expect
the difference Ƹ𝑝 𝑈𝑟𝑏𝑎𝑛 − Ƹ𝑝 𝑅𝑢𝑟𝑎𝑙 to be 0, give or take some
sampling error.
■ Let’s repeat the randomization for 100 trials, and see if we
observe a difference of 0.255 (25.5%) or more. Identify the
observed result and the results of our trials on the graph
below.
36

Mapping our trial 1 and trial 2
Trial 2
result
Observed
resultTrial 1
result
37

Pass or fail? – Evaluating the results
■ Approximately, how many simulated results were as far from the
expected difference of 0 as our actual observed result of 0.255 or
25.5%?
■ TOP HAT
■ The distribution of simulated results shows what differences in
proportion looks due to chance or natural variation.
■ Since there were zero simulated results that were as far from the
expected difference of 0 as our observed sample difference.
■ That is, a difference in proportions of 0.255 (almost) never occurs by
chance or natural variation.
■ So, we would consider the observed difference of 0.255 very
convincing since this sample result is extremely rare if the null
hypothesis is true.
■ Based on our sample evidence, we are more inclined to think that
students’ background does appear to be associated with their
course results.
38

Two-sided question: Recall Avengers
■ How many simulated results were as far from the
expected difference of 0 as the actual data?
39

One tail (one-sided) tests
■ Sales pitches ~ In this example, the salesperson wants
to know if the new sales pitch leads to a higher rate of
requests for the product.
■ If it looks like the new pitch is nearly the same or even
worse, they will not bother with changing from the
current approach.
■ In such a situation, it makes sense to pay attention to
only one tail or one side of the randomization
distribution.
40

Sales pitches – Summarizing the study
■ Using the new approach, 21 of 100 calls led to requests
for the mailing of additional product information. For the
current approach in the other 100 calls, only 14 led to
requests for the product information mailing.
■ Use the information to fill in the table.
Sales Pitch Yes No Total
New pitch
Current pitch
Total
41

Sales pitches – Summarizing the study
■ Using the new approach, 21 of 100 calls led to requests
for the mailing of additional product information. For the
current approach in the other 100 calls, only 14 led to
requests for the product information mailing.
■ Use the information to fill in the table.
■ Observed difference = 21/100 – 14/100 = 0.07
Sales Pitch Yes No Total
New pitch 21 79 100
Current pitch 14 86 100
Total 35 165 200
42

Sales pitches – Simulating the study
■ In the simulation, we would write new pitch on 100 cards
and current pitch on 100 cards.
■ Then we would thoroughly shuffle the cards and deal 35
cards into one pile to represent calls that led to a request
for product information and put the remaining 165 cards
in a pile to represent calls that did not lead to a request
for product information.
43

Sales pitches - Hypotheses
Because we only care to see if the new pitch increases
sales, our alternative hypothesis should reflect this.
■ H0: The new and current sales pitches are equally
effective.
■ HA: The new sales pitch is better than the current sales
pitch and leads to a higher rate of requests for more
information.
44

Sales pitches - Randomization results
0.07
45

Sales pitches –Evaluating the results
■ Does it appear that our actual result is typical of the results
simulated under the null hypothesis?
■ Would you switch to the new sales pitch with this kind of
evidence? Why or why not?
■ TOP HAT
■ May be, but not very typical.
■ Our observed difference (or something larger) occurs about
13% of the time which is not high enough to be typical.
■ So, may be the new sales pitch is better or may be it is not.
■ The salesperson would need to make a judgment call on
whether or not they found this evidence convincing in favor of
the new sales pitch.
46

Government Shutdown - Hypotheses
■ Recall the null and alternative hypotheses for the
government shutdown polling example.
■ H0: The variables political inclination and network poll
results are independent. The difference in the reported
proportion of poll respondents who think that Trump was
responsible is due to chance (variability in sample data).
■ HA: The variables political inclination and network poll
results are not independent. The difference we see is too
large to be considered as chance error. The political
inclinations of the network may be affecting the results
that they report.
47

Government shutdown – Observed data
Network Blame Trump Blame Others Total
New York Post
(left leaning)
637 425 1062
FOX News
(right leaning)
514 494 1008
Total 1151 919 2070
48

Government shutdown - simulation
■ We can use randomization to simulate what would
happen if the political inclination of the network did not
affect the poll results.
■ In the simulation, we would write New York Post on 1062
cards and FOX News on 1008 cards.
■ Then we would thoroughly shuffle the cards and deal
1151 cards into one pile to represent those who think
Trump is responsible and put the remaining 919 cards in
a pile to represent others.
49

Government shutdown - Randomization
■ If the two variables are indeed independent, we would again expect
the difference Ƹ𝑝 𝐹𝑂𝑋 − Ƹ𝑝 𝑁𝑌𝑃 to be 0, give or take some sampling
error.
■ What proportion of the simulated results are as far from 0 as our
observed result of 0.08989?
50

More trials are better
■ Actually, 100 is a fairly small number of simulations to
run, so let’s see what happens when we increase that to
5000.
51

■ In this case, our observed result is never seen if the null
hypothesis is true, that is, if the two variables are
independent.
■ It is easy to believe that the null hypothesis is not
compatible with the observed data.
■ We may want to further investigate whether the political
affiliation of the network is affecting the polling results
that they report.
52

Pollution and the weather
■ Recall the meteorologist who is concerned that pollution
associated with heavy traffic in the local metropolitan
area could be influencing weather patterns.
■ H0: Weather and the day of the week are independent.
The observed difference in precipitation rates between
the weekdays and the weekends is due to chance.
■ HA: Weather and the day of the week are not
independent. The observed difference in precipitation
rates between the weekdays and the weekends is too
large to be attributable to chance variation.
53

Observed weather data
Day Type Precipitation No Precipitation Total
Weekday 265 235 500
Weekend 119 81 200
Total 384 316 700
Calculate the observed value of the
differences:
Ƹ𝑝 𝑊𝑒𝑒𝑘𝑑𝑎𝑦 − Ƹ𝑝 𝑊𝑒𝑒𝑘𝑒𝑛𝑑 =
265
500
−
119
200
= −0.065
54

■ We can use randomization to simulate what would
happen if the type of day does not affect precipitation.
■ In the simulation, we would write weekday on 500 cards
and weekend on 200 cards.
■ Then we would thoroughly shuffle the cards and deal
385 cards into one pile to represent days with
precipitation and put the remaining 316 cards in a pile to
represent days without precipitation.
■ Then we tabulate the results and again determine
Ƹ𝑝 𝑊𝑒𝑒𝑘𝑑𝑎𝑦 − Ƹ𝑝 𝑊𝑒𝑒𝑘𝑒𝑛𝑑. Let’s again skip doing this by hand
this time and go straight to the computer simulation.
55

Randomization distribution 1
56

Randomization distribution 2
57

Investigating the simulated results
■ In what proportion of simulated results did it rain on
fewer weekdays than our observed result of -0.065?
68
1000
= 0.068
■ In what proportion of simulated results did it rain either
fewer weekdays than our observed result of
-0.065 or more than 0.065?
126
1000
= 0.126
■ Which of the above proportions should we use to make
our decision? Why?
Since we don’t have any reason to think that rain is more or
less likely on the weekend, we should use the second
proportion to make our decision.
58

Weather conclusions
■ Based on a comparison of our observed result to the
simulated results under the null hypothesis, does the null
hypothesis seem compatible with the observed result?
Justify your answer based on the simulation.
■ Since our sample result or something more extreme
occurs more than 10% of the time in the randomization,
we would not abandon the null hypothesis.
■ Has your belief about the weather changed because of
the simulation? Is further investigation warranted?
Explain why or why not.
59

Logic of randomization simulations Page 71
In each of our examples so far we have set up an independence
model to reflect the null hypothesis claim that the explanatory
variable is independent of the response variable.
■ In the Avengers and the Sales pitches examples, our data did
not appear out of the ordinary, and so we had insufficient
evidence to conclude that there was a relationship between
the explanatory and response variables. This is the same as
saying the independence model was a good fit for our data.
■ In the Pass or fail and Government shutdown examples, our
data appeared to be very out of the ordinary, and so we had
sufficient evidence to conclude that there was a relationship
between the explanatory and response variables. This is the
same as saying the independence model was a bad fit for our
data.
60

2.3 Hypothesis Testing
The p-value
■ In the examples we have done so far, we have used
randomization to simulate results based on the null
hypothesis of no difference or independence.
■ In each case we calculated the proportion of simulated
results that were at least as or more in favor of the
alternative hypothesis.
■ As we saw in Chapter 0, relative frequencies can be
thought of as probabilities, so we can think of this
proportion as an estimate of the probability of observing
a result as favorable to the alternative hypothesis as our
observed data.
■ We call this proportion the p-value
61

US COURT SYSTEM bottom page 72
Research question and hypotheses:
■ Is the defendant guilty of a crime?
■ H0: The defendant is not guilty (innocent)
■ HA: The defendant is guilty
■ Collect data: Detectives investigate the crime.
■ Analyze data: Prosecution and defense present the result of
the investigations in court.
■ Evaluating the evidence: A jury deliberates about whether the
prosecution has provided evidence that the defendant is guilty
beyond a reasonable doubt.
■ In practice, juries and judges have to make a decision in favor
of one hypothesis or the other.
■ In statistics and in science, we generally try to be more
modest in our conclusions.
62

CSI: Murder of Dr. Duck
■ Dr. Duck is murdered
■ Blood is found at the crime
■ Everyone in this lecture theatre is a suspect
■ Primary suspect: Monk says “Dr. H is the guy”
■ Scenario 1: Blood found at the crime scene is Blood type
O and is a match for Dr. H
■ Scenario 2: DNA found at the crime scene is a match for
Dr. H.
■ Top Hat
63

CSI: Murder of Dr. Duck
■ Dr. Duck is murdered
■ Blood is found at the crime
■ Everyone in this lecture theatre is a suspect
■ Primary suspect: Monk says “Dr. H is the guy”
■ Scenario 1: Blood found at the crime scene is Blood type
O and is a match for Dr. H
■ Scenario 2: DNA found at the crime scene is a match for
Dr. H.
EVALUATION:
The p-value is extremely small. Here the p-value is the
chance of a DNA match. This is why
“Dr. H is the guy”
64

The p-value as a probability
■ The p-value of a test is a probability calculated using a model
based on the null hypothesis being tested.
■ It is the probability of observing data at least as favorable to
the alternative hypothesis as our current data set, using the
null model.
■ NOTE: The p-value is a conditional probability.
We can use p-values to help us evaluate how well the null
hypothesis model explains or “fits” our observed sample results.
■ If the p-value is large this indicates that our observed results
look like they could be a result of the natural variation that we
expect to see when we take random samples.
■ The smaller a p-value is, the less inclined we will be to think
that our sample result is simply due to natural variation.
■ In other words, small p-values give us reason to DOUBT that
the null model is a good predictor for the observed results we
have.
65

P-value table: Must Memorize!!!
■ Later in the course, we will discuss in more detail the
reasons for considering a particular p-value either “large”
or “small”.
■ For now we will use a (somewhat arbitrarily) chosen scale
for evaluating the amount of evidence a p-value gives us
to doubt the compatibility of the null model with our data.
■ For example, if the p-value is between 0.01 and 0.05, we
will say we have strong evidence that the null model is
not a good fit for our observed sample results.
66

One-sided and two-sided hypotheses
■ In the dolphin therapy experiment you investigated in
recitation, we considered whether dolphin therapy was a more
effective treatment for depression than the control of outdoor
nature program without dolphins, but ignored the possibility
that it might be less effective.
■ By framing the hypotheses so that they reflect the direction
our observed data is pointing, we can fall victim to
confirmation bias—looking only for the results we expect to
see.
■ To help with this, we need to adjust our alternative hypothesis.
Example: Dolphin therapy
■ In the dolphin therapy example, our sample data indicated
that patients who used dolphin therapy were more likely to
have a successful outcome.
■ Difference = Ƹ𝑝 𝐷𝑜𝑙𝑝ℎ𝑖𝑛 − Ƹ𝑝 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 = 0.4667
67

Avoid Confirmation Bias
We set up our hypotheses to reflect this initial result:
■ H0: Therapy type and substantial improvement are independent.
Dolphin therapy and the control group therapy are equally effective
at improving depression. The observed difference is due to chance.
■ HA: Therapy type and substantial improvement are not independent.
The observed difference is not due to chance and the success rate
for dolphin therapy is higher than for the control group.
■ To avoid confirmation bias, we can more neutrally wonder if dolphin
therapy is either more or less effective than the control therapy. We
should change the alternative hypothesis to reflect this more modest
viewpoint. (The null hypothesis remains the same.)
New hypotheses:
■ HA: Therapy type and substantial improvement are not independent.
The observed difference is not due to chance and the success rate
for dolphin therapy is different from the success rate for the control
group.
68

Two sided hypotheses and p-values
■ Since the p-value is the chance we observe a result at least as
favorable to the alternative as the result in our data, any difference
less than or equal to -0.4667 would also provide equally strong
evidence favoring HA..
■ In a particular 3000 trials, 2.5% were further away from 0 than
0.4667. The two shaded tails below provide a visual representation
for the two-sided test.
69

Caution!!!
Calculating two-sided p-values
■ For a two-sided test, we often will take the area (or count of
observations) in a single tail and double it to get the p-value.
■ The app will count observations “beyond” the observed value
in the data.
■ Use this value as an approximate p-value for randomization
tests.
■ Later in the course we will learn techniques that will add to our
methods for calculating p-values.
■ Unless you are only interested in one side or the other, you
should perform a two-sided test as the default.
Caution!
■ Hypothesis should be set up before seeing the data.
■ Switching to a one-sided test after performing the experiment
is bad statistical practice!
70

Section 2.4: Example: “Boomeranging”
■ Research from the National Longitudinal Study of Youth 1997 revealed
that 13% of men born between 1980 and 1984 had left their parental
home and then moved back in later.
■ This return is sometimes referred to as “Boomeranging” and has
implications for housing and household formation that affect many
areas of the economy.
■ To test whether this percentage has changed, suppose researchers take
a sample of 150 men born between 1988 and 1992 and find that 25 of
them had left their parental home and then returned.
■ Question: Is this evidence that the percentage has changed?
■ Notice that in this scenario, we are interested in only one categorical
variable, whether or not a person has moved back in with their parents
after leaving home.
■ Like in our previous hypothesis test situations, we have just one sample
and observed result, but this time we are interested in a single
proportion, not the difference between two sample proportions.
■ Conveniently, the logic and structure of the hypothesis test procedure
will remain the same.
71

Example: “Boomeranging”
1. What is the research question?
Has the rate of boomeranging changed?
2. What is the population of interest? What is the claimed
(or historic value of the) parameter?
Population: All Men born between 1988 and 1992
Claimed parameter: Population proportion is 13%
3. Should we construct one-sided or two-sided
hypotheses? Why?
Two-sided is more appropriate, since the percentage could
have gone up or down.
72

4. What is the observed sample statistic?
25/150 = 16.67%
5. Write the null and alternative hypotheses in the space
below.
■ H0: The rate of boomeranging has not changed from
13%. The observed difference is due to chance.
■ HA: The rate of boomeranging has changed from 13%.
The observed difference is not due to chance.
73

■ We can use randomization to simulate what would happen if
the rate was still 13%.
■ We simulate the study using a stack of cards. Imagine we have
13 green cards and 87 white cards that we shuffle thoroughly,
and then select the top card from the pile. (Note that there is
a 13/100 = 13% chance of the top card being green.)
■ Drawing a green card would be like choosing a young man
who has left and then moved back in with his parents, if the
null hypothesis is true.
■ Drawing a white card would be like choosing someone who
had not.
■ We repeat the shuffle-and-draw process 150 times, recording
the color, then replacing the card and reshuffling each time.
■ Note: Drawing the card with replacement is important. If we
did not put the card back each time, the proportion of green
and white cards would change from shuffle to shuffle—and we
would run out of cards before we were done!
74

Here are some example results:
■ Result #1:
White cards: 126, Green cards: 24 Ƹ𝑝 𝐺𝑟𝑒𝑒𝑛 =
24
150
= 0.16
■ Result #2:
White cards: 133, Green cards: 17 Ƹ𝑝 𝐺𝑟𝑒𝑒𝑛 =
17
150
= 0.113
■ Just like in the earlier part of the chapter we use a computer
simulation to run this experiment many times so we can see the null
distribution.
Randomization distribution (SIMULATION)
■ We use a computer to repeat these experimental results many
times. After each 150 shuffle and draws we record the sample
proportion of green cards, and then create a dotplot/histogram of
the results.
■ Where is this distribution centered?
75

Randomization simulation
76
Center is the
null value 0.13

Here are some example results:
■ Result #1:
White cards: 126, Green cards: 24 ො𝑝 𝐺𝑟𝑒𝑒𝑛 =
24
150
= 0.16
■ Result #2:
White cards: 133, Green cards: 17 ො𝑝 𝐺𝑟𝑒𝑒𝑛 =
17
150
= 0.113
■ Just like in the earlier part of the chapter we use a computer simulation
to run this experiment many times so we can see the null distribution.
■ Randomization distribution (SIMULATION)
■ We use a computer to repeat these experimental results many times.
After each 150 shuffle and draws we record the sample proportion of
green cards, and then create a dotplot/histogram of the results.
■ Where is this distribution centered?
At the expected value of the population proportion based on the null model.
That is, the null value 0.13
(Notice it is NOT centered at 0 like the distribution of differences!)
77

■ Approximately how often did the simulated results show a value
further away from 0.13 than our sample result of 25/150 = 0.167?
■ What is the approximate p-value?
■ Evaluating the results
Just like before, we can use the p-value from the results of the
randomization to help us decide whether the null hypothesis model is a
good fit for our observed results.
■ In other words, is our observed result something that we could
expect to see if the population proportion was still 13%?
■ What kind of evidence do we have to adjust our thinking about the
population proportion?
78

79
“Beyond” means:
The fraction of simulated results with proportions more than 0.167,
that is, 0.037 more than the null population proportion of 0.13 and
also 0.037 less than the null population proportion of 0.13.
Center is 0.13

■ Approximately how often did the simulated results show a value further
away from 0.13 than our sample result of 25/150 = 0.167?
■ What is the approximate p-value? p-value = 453/2000 = 0.2265
Evaluating the results: Just like before, we can use the p-value from the results
of the randomization to help us decide whether the null hypothesis model is a
good fit for our observed results.
■ In other words, is our observed result something that we could expect to see
if the population proportion was still 13%?
Yes, 22.65% of the time we see results that are 0.037 more than the null
population proportion of 0.13 and 0.037 less than 0.13.
■ What kind of evidence do we have to adjust our thinking about the
population proportion?
Little evidence to think that the population proportion is different from 0.13
80

Example: Web Design
An online art gallery website has both free and premium accounts for
users. Over the past two years, the percentage of premium accounts
has held steady at 25%.
Designers have proposed some changes to the premium features but
before the website owners invest in the redesign, they want to know if
the proportion of premium accounts is likely to increase.
They survey a random sample of 500 users and find that 150 of them
say they would either continue with or switch to a premium account if
the new features were included.
■ What is the research question?
Will the redesign increase the proportion of premium accounts?
■ What is the population of interest? What is the parameter of
interest?
All users of website. Proportion of all users who continue or switch to
premium account.
81

Hypotheses:
An online art gallery website has both free and premium accounts for
users. Over the past two years, the percentage of premium accounts
has held steady at 25%.
Designers have proposed some changes to the premium features but
before the website owners invest in the redesign, they want to know if
the proportion of premium accounts is likely to increase.
They survey a random sample of 500 users and find that 150 of them
say they would either continue with or switch to a premium account if
the new features were included.
H0: There is no change in the proportion of premium account users from
the historical value of 25%.
HA: There is an increase in the proportion of premium account users
from 25%.
82

■ What is the sample proportion of users who say they will
continue with or switch to a premium account?
150/500 = 0.30
Perform the simulation:
■ Let a card represent an user of this website. Start with 100
cards, and take 25 green cards to represent premium users
and 75 white cards to represent the free users.
■ Shuffle and draw a card. Record the color and place the card
back in the pile.
■ Do this 500 times. Calculate the sample proportion of green
cards in each set of shuffle and draws.
■ Repeat this procedure 3000 times to form a distribution of the
sample proportions.
83

■ Give an estimate for the p-value. (say out of 3000 simulations)
Very small, say about 0.001
■ What would you recommend to the website owners?
Do the redesign, strong evidence to think that redesign will increase
premium users.
84

Chapter2 slides-part 1-harish complete

More Related Content

Similar to Chapter2 slides-part 1-harish complete (20)

More from EasyStudy3 (20)

Recently uploaded (20)

Chapter2 slides-part 1-harish complete