Lecture 11

Introduction to Statistics
STA250

Lecture 11 - April 21st, 2010
1

Probability

✤ How we express likelihood mathematically

✤ For an event “A”, the probability of A occurring is denoted “P(A)”

✤ Always number between 0 and 1

✤ P(A) = 0 means that A never happens

✤ P(A) = 1 means that A always happens

4

Independence & Exclusivity

✤ independence - A and B are independent if the occurrence of one does
not affect the probability of the other:

✤ P(A|B) = P(A) = P(A|not B)

✤ P(B|A) = P(B) = P(B|not A)

✤ mutually exclusive - A and B are mutually exclusive if it is impossible
for both of them to occur:

✤ P(A and B) = 0

5

Probability Rules

✤ Probability of not happening is 1 minus probability of occurring

✤ P(not A) = 1 - P(A)

✤ When A and B are independent:

✤ P(A and B) = P(A) × P(B)

✤ P(A or B) = P(A) + P(B) - P(A and B)
6

Probability Fundamentals

✤ Sum of probabilities of all possible outcomes is 1

✤ Flip a coin and you get either heads or tails:

✤ P(heads) + P(tails) = 1 = P(heads or tails)

✤ With mutually exclusive outcomes A, B, C, and D

✤ P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D)

7

Conditional Probability

✤ With non-independent events, knowing one has happened may change
the likelihood of the other occurring

✤ Conditional probability - what is the probability of A given that B has
already happened?

✤ P(A|B)

✤ Bayes Rule for conditional probability:
P (A and B) P (B|A) × P (B)
P (A|B) = =
P (B) P (A)
8

Conditional Probability Hoedown

✤ At John Jay, 62.5% of all students hate statistics while 25% of all
students hate statistics and passed the class. What is the probability
that a student passes stats given that the student hates statistics?

✤ Two fair dice are rolled, what is the (conditional) probability that
exactly one die’s value is a 1 or 2 given that they show different
numbers?

9

Something Really Important

✤ Classic stats problem emerged from the game show Let’s Make a Deal,
often called the Monty Hall Problem after the show’s host

✤ Has ended many friendships and caused bitter internet arguments

10

The Game

✤ There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors
is a fabulous prize that Monty has hidden

✤ You get to choose a door, which may or may not have the prize

✤ Monty opens another door without revealing the prize

✤ You now have the option to stay with your door or switch to another,
should you stick with your original choice or switch?

11

Choosing The First Door

✤ Three doors and one prize so you’ll pick the right door one out of
three times, i.e. P(right ﬁrst choice) = 1/3

✤ Likewise, you’ll pick the wrong door with P(wrong ﬁrst choice) = 2/3

12

The Reveal

✤ No matter how you choose, there are two other doors one prize. This
means there is at least one of the two unchosen doors with nothing
behind it.

✤ Monty knows where the prize is and opens the door that DOESN’T
have the prize behind it.

✤ This leaves your door and one other. One of them has the prize and
the other doesn’t, should you switch?

13

To Switch, or Not To Switch

✤ You don’t know if you have the right door!

✤ What’s the probability that your door has the prize?

✤ What’s the probability that the other door has the prize?

✤ What’s the probability that your door doesn’t have the prize?

14

Example of the Game

✤ As an example, the prize is hidden behind door “3”.

✤ If you choose door “3” initially, switching can only lose you the prize

✤ If you choose door “2” initially, Monty must open door “1” and
switching will get you the prize

✤ If you choose door “1” initially, Monty must open door “2” and
switching will get you the prize

15

Switch Already!

✤ Switching is a way of saying “I don’t think the prize is behind this
door”

✤ Since the probability is 1/3 that the prize is behind any one door, the
probability is 2/3 that the prize is not behind that door

✤ Always switch and you’ll win 2/3’s of the time!

16

Expected Values

✤ Probability can be used to estimate rewards in a game of chance

✤ Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ...

✤ Silly coin-ﬂipping game: If you can ﬂip a coin three times and have
exactly one Heads, you get a dollar. If not, you give me a dollar.

✤ Should you take the bet?

17

Normal Distribution

✤ The distribution is

✤ unimodal

✤ symmetric

✤ “light tailed”

✤ Notation: X ~ N(μ, σ) means “the random variable X has a normal
distribution with mean μ and standard deviation σ“

18

Area Under the Curve Equals 1

0.8
N(!3,0.5)
N(2,1)
N(!1,3)
0.6
f(x)

0.4
0.2
0.0

!4 !2 0 2 4
19

Rules of Thumb

✤ P(within one standard deviation) = 0.68

✤ P(within 1.68 standard deviations) = 0.95

✤ P(within three standard deviations) = 0.997

✤ With “real” normal distributions, you just don’t get outliers!

20

Standard Normal Distribution

✤ standard normal distribution is the normal distribution with mean μ = 0
and standard deviation σ = 1: Z ~ N(0, 1)

✤ Any normal distribution can be transformed into a standard normal
distribution. If X ~ N(μ, σ), then:
X −µ
=Z
σ
✤

21

Z - Scores & the Standard Normal

✤ Each observation has an associated z-score, which is the number of
standard deviations that observation is away from the mean

✤ Converting a sample from a normal distribution to z-scores transforms
it to a standard normal distribution

✤ z-score = (observation - mean) ÷ standard deviation

✤ If the observation is above the mean then the Z-score is positive, if
below then the Z-score is negative

22

Interval Estimation

✤ We might estimate the mean for an entire population using the mean
for a small sample, this is called a point estimate.

✤ A confidence interval gives a range of “plausible” values for the
population mean

✤ Usually reported as "mean ± wiggle room"

✤ Each interval has an associated level of confidence, usually written as
a percent (95% being the most common)

✤ "I am 95% confident that the population mean is in this range,
with the sample mean being the most likely guess"
23

Two-Sided: 1.96 Std. Dev.’s

24

Normal Critical Deviates
the point for which the area und
ht is γ. how many you wanted to ﬁnd the middle X% of to travel
Critical normal deviate: If
✤

distribution, standard deviations would you have
the

in each direction.

✤ Deﬁne zγ to be the point for which the area under the normal curve to

matical notation, zγ is the point f
the right is γ.

✤ In more mathematical notation, zγ is the point for which:

P (Z > zγ ) = γ,
25

Interpreting Confidence Intervals

✤ The width of a confidence interval indicates precision

✤ An observation's z-score can test if an observation is similar to
others, bigger than ±1.96 means 95% likely to be different

✤ 95% confidence intervals are by far the most common, but any level of
confidence interval can be computed:

✤ 90%: mean ± (1.645 × standard deviation)


26

Components of Confidence

✤ How might a conﬁdence interval change as:

✤ Ȳ increases

✤ σ increases

✤ n increases

✤ the conﬁdence level increases (e.g., from 95% to 99%)

27

Conflicting Hypotheses

✤ In statistical inference, there are always two conﬂicting hypotheses:

✤ null hypothesis “H0” - often states “no effect” or “no difference”.
This is the hypothesis that we will assume to be true unless we
have convincing evidence to the contrary.

✤ alternative hypothesis “H1” or “Ha” - The hypothesis that we will
believe only if the evidence strongly supports it.

✤ The null hypothesis typically has “=” in it
28

Hypothesis as Metaphor

✤ Hypothesis tests are like U.S. criminal trials

✤ The judicial system is structured such that the accused person is
presumed innocent until proven guilty. In such a system the absence
of convincing evidence (“beyond a reasonable doubt”) results in the
person being set free.

✤ H0: innocent

✤ Ha: guilty

29

P-values

✤ In each hypothesis testing situation we will compute a p-value. This is
the probability that the null hypothesis is correct given the data.

✤ Accept H0 if the p-value is large

✤ Reject H0 if the p-value is small, go with Ha

✤ How small is small enough? It depends... (usually p < 0.05)

30

Notes on Hypothesis Testing

✤ “Statistical significance” is not the same as “clinical significance”. A
tiny effect may be “statistically significant” if the sample size is huge.

✤ The p-value does not describe the magnitude of the effect!

✤ When reporting analysis results, a confidence interval should always
be provided along with the results of a hypothesis test.

✤ The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to
similar conclusions, in practice they often do not)

✤ Never report results as “p < 0.05”, report the p-value and let the
reader decide if they agree with your interpretation.
31

• Type I Error: Reject H0 when H0 is actually true.
– For example, to conclude there is an effect (or a difference)
when there really isn’t one.
– Also called “false positive”.
• Type II Error: Accept H0 when H0 is actually false.
– For example, to fail to find an effect (or a difference) when
there really is one.
– Also called “false negative”.

State of nature
Decision H0 is true Ha is true
Accept H0 qh
q Type II
qh
q
Reject H0 Type I
32

Probabilities of Errors of Type I and Ty
Probabilities

✤ Each of the errors has an associated probability: associated
Each of the errors has an probabilit
• α = P (Type I Error)
• β = P (Type II Error)

✤ Hypothesis testing is set up to control Type I error rate (α)
Hypothesis testing is set up to control Type I
The experimenter chooses α - everything else follows from this!
The experimenter chooses α — everything else
✤ Most common (by far) choice for α is 0.05.

✤ (Also, 0.01 and 0.10most common
The on occasion) (by far) choice for α is 0.05
33

Comparing Means

✤ Tests:

✤ Single group versus a ﬁxed mean

✤ Two groups with the same variable

✤ Two groups with pairwise observation

✤ Hypotheses:

✤ H0 : the two groups have equal means ( mean A = mean B )

✤ Ha : the means of the groups are different
34

Assumptions for t-Tests

✤ The group (sample) is the Independent Variable (dichotomous)

✤ The outcome of interest is the Dependent Variable

✤ t-Tests are only valid if these assumptions are not violated:

✤ The research question DOES involve the comparison of 2 means

✤ The Dependent Variable is a quantitative scale

✤ The distribution of the Dependent Variable is normal

✤ Independent Variable assigned randomly (independently)
35

Met Assumptions, but Which Test?

✤ Only one group with data: One-Sample t-Test

✤ Two groups:

✤ Not related to each other: Independent-Samples t-Test

✤ Related samples (e.g. before & after): Paired-Samples t-Test

36

One-Sample t-Test

✤ Compares a sample mean to a known population mean.

✤ Need to know the population mean!

✤ Example: Is there a difference between the population mean IQ (100)
and the mean IQ for a sample of 50 John Jay students (125)?

37

Paired-Samples t-Test

✤ Sometimes we have two sets of measurements that are related:

✤ Each subject is measured before and after treatment

✤ With pairs of identical twins

✤ Subject has different treatment on left & right arms

✤ For each observation in one group there is exactly one closely related
observation in the other groups (can make pairs, one of each group)
38

Independent-Samples t-Test

✤ Compares the means of two groups or samples.

✤ One of the most common situations in statistical inference is that of
comparing two means from independent samples

✤ Clinical trials - treatment group vs. placebo group

✤ Exposed vs. unexposed

✤ Males vs. females

✤ General population vs. speciﬁc subpopulation
39

Review: Hypotheses

✤ Null Hypothesis: there is no relationship between the independent
and dependent variables

✤ p-value: the probability of the null hypothesis (H0) being true

✤ Reject H0 if p is too small (usually p < 0.05)

✤ If we reject H0, we must instead choose the alternative (Ha)

40

Review: t-Tests

✤ Compare the means of exactly two groups

✤ Only one group (with data) compared to a ﬁxed number:

✤ One-Sample t-Test

✤ Two groups (with data):

✤ Not related to each other: Independent-Samples t-Test

✤ Related samples (e.g. before & after): Paired-Samples t-Test
41

Lecture 11

More Related Content

What's hot (20)

Similar to Lecture 11 (20)

Lecture 11