SlideShare a Scribd company logo
Introduction to Statistics
STA250

Lecture 11 - April 21st, 2010
                                1
2
3
Probability


✤   How we express likelihood mathematically

✤   For an event “A”, the probability of A occurring is denoted “P(A)”

✤   Always number between 0 and 1

    ✤   P(A) = 0 means that A never happens

    ✤   P(A) = 1 means that A always happens


                                     4
Independence & Exclusivity

✤   independence - A and B are independent if the occurrence of one does
    not affect the probability of the other:

    ✤   P(A|B) = P(A) = P(A|not B)

    ✤   P(B|A) = P(B) = P(B|not A)

✤   mutually exclusive - A and B are mutually exclusive if it is impossible
    for both of them to occur:

    ✤   P(A and B) = 0

                                       5
Probability Rules

✤   Probability of not happening is 1 minus probability of occurring

    ✤   P(not A) = 1 - P(A)



✤   When A and B are independent:

    ✤   P(A and B) = P(A) × P(B)



✤   P(A or B) = P(A) + P(B) - P(A and B)
                                     6
Probability Fundamentals


✤   Sum of probabilities of all possible outcomes is 1

✤   Flip a coin and you get either heads or tails:

    ✤   P(heads) + P(tails) = 1 = P(heads or tails)

✤   With mutually exclusive outcomes A, B, C, and D

    ✤   P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D)


                                        7
Conditional Probability

✤   With non-independent events, knowing one has happened may change
    the likelihood of the other occurring

✤   Conditional probability - what is the probability of A given that B has
    already happened?

    ✤   P(A|B)

✤   Bayes Rule for conditional probability:
                           P (A and B) P (B|A) × P (B)
                 P (A|B) =            =
                              P (B)         P (A)
                                       8
Conditional Probability Hoedown


✤   At John Jay, 62.5% of all students hate statistics while 25% of all
    students hate statistics and passed the class. What is the probability
    that a student passes stats given that the student hates statistics?



✤   Two fair dice are rolled, what is the (conditional) probability that
    exactly one die’s value is a 1 or 2 given that they show different
    numbers?


                                       9
Something Really Important



✤   Classic stats problem emerged from the game show Let’s Make a Deal,
    often called the Monty Hall Problem after the show’s host

✤   Has ended many friendships and caused bitter internet arguments




                                    10
The Game


✤   There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors
    is a fabulous prize that Monty has hidden

✤   You get to choose a door, which may or may not have the prize

✤   Monty opens another door without revealing the prize

✤   You now have the option to stay with your door or switch to another,
    should you stick with your original choice or switch?


                                     11
Choosing The First Door



✤   Three doors and one prize so you’ll pick the right door one out of
    three times, i.e. P(right first choice) = 1/3

✤   Likewise, you’ll pick the wrong door with P(wrong first choice) = 2/3




                                     12
The Reveal


✤   No matter how you choose, there are two other doors one prize. This
    means there is at least one of the two unchosen doors with nothing
    behind it.

✤   Monty knows where the prize is and opens the door that DOESN’T
    have the prize behind it.

✤   This leaves your door and one other. One of them has the prize and
    the other doesn’t, should you switch?


                                    13
To Switch, or Not To Switch


✤   You don’t know if you have the right door!

✤   What’s the probability that your door has the prize?

✤   What’s the probability that the other door has the prize?

✤   What’s the probability that your door doesn’t have the prize?



                                     14
Example of the Game


✤   As an example, the prize is hidden behind door “3”.

✤   If you choose door “3” initially, switching can only lose you the prize

✤   If you choose door “2” initially, Monty must open door “1” and
    switching will get you the prize

✤   If you choose door “1” initially, Monty must open door “2” and
    switching will get you the prize


                                      15
Switch Already!


✤   Switching is a way of saying “I don’t think the prize is behind this
    door”

✤   Since the probability is 1/3 that the prize is behind any one door, the
    probability is 2/3 that the prize is not behind that door

✤   Always switch and you’ll win 2/3’s of the time!



                                      16
Expected Values

✤   Probability can be used to estimate rewards in a game of chance

    ✤   Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ...



✤   Silly coin-flipping game: If you can flip a coin three times and have
    exactly one Heads, you get a dollar. If not, you give me a dollar.

✤   Should you take the bet?

                                     17
Normal Distribution

✤   The distribution is

    ✤   unimodal

    ✤   symmetric

    ✤   “light tailed”

✤   Notation: X ~ N(μ, σ) means “the random variable X has a normal
    distribution with mean μ and standard deviation σ“

                                   18
Area Under the Curve Equals 1

            0.8
                                         N(!3,0.5)
                                         N(2,1)
                                         N(!1,3)
            0.6
     f(x)

            0.4
            0.2
            0.0




                  !4   !2        0   2        4
                            19
Rules of Thumb


✤   P(within one standard deviation) = 0.68

✤   P(within 1.68 standard deviations) = 0.95

✤   P(within three standard deviations) = 0.997



✤   With “real” normal distributions, you just don’t get outliers!


                                      20
Standard Normal Distribution


✤   standard normal distribution is the normal distribution with mean μ = 0
    and standard deviation σ = 1: Z ~ N(0, 1)

✤   Any normal distribution can be transformed into a standard normal
    distribution. If X ~ N(μ, σ), then:
                               X −µ
                                    =Z
                                 σ
✤




                                     21
Z - Scores & the Standard Normal

✤   Each observation has an associated z-score, which is the number of
    standard deviations that observation is away from the mean

✤   Converting a sample from a normal distribution to z-scores transforms
    it to a standard normal distribution

    ✤   z-score = (observation - mean) ÷ standard deviation

✤   If the observation is above the mean then the Z-score is positive, if
    below then the Z-score is negative

                                         22
Interval Estimation

✤   We might estimate the mean for an entire population using the mean
    for a small sample, this is called a point estimate.

✤   A confidence interval gives a range of “plausible” values for the
    population mean

       ✤   Usually reported as "mean ± wiggle room"

✤   Each interval has an associated level of confidence, usually written as
    a percent (95% being the most common)

       ✤   "I am 95% confident that the population mean is in this range,
           with the sample mean being the most likely guess"
                                      23
Two-Sided: 1.96 Std. Dev.’s




                 24
Normal Critical Deviates
the point for which the area und
ht is γ. how many you wanted to find the middle X% of to travel
   Critical normal deviate: If
    ✤

   distribution,               standard deviations would you have
                                                                  the

        in each direction.

    ✤   Define zγ to be the point for which the area under the normal curve to

matical notation, zγ is the point f
        the right is γ.

    ✤   In more mathematical notation, zγ is the point for which:


                             P (Z > zγ ) = γ,
                                         25
Interpreting Confidence Intervals

✤   The width of a confidence interval indicates precision

    ✤   An observation's z-score can test if an observation is similar to
        others, bigger than ±1.96 means 95% likely to be different

✤   95% confidence intervals are by far the most common, but any level of
    confidence interval can be computed:

    ✤   90%: mean ± (1.645 × standard deviation)

    ✤   95%: mean ± (1.96 × standard deviation)

    ✤   99%: mean ± (2.58 × standard deviation)
                                        26
Components of Confidence


✤   How might a confidence interval change as:

    ✤   Ȳ increases

    ✤   σ increases

    ✤   n increases

    ✤   the confidence level increases (e.g., from 95% to 99%)


                                      27
Conflicting Hypotheses

✤   In statistical inference, there are always two conflicting hypotheses:

    ✤   null hypothesis “H0” - often states “no effect” or “no difference”.
        This is the hypothesis that we will assume to be true unless we
        have convincing evidence to the contrary.

    ✤   alternative hypothesis “H1” or “Ha” - The hypothesis that we will
        believe only if the evidence strongly supports it.



✤   The null hypothesis typically has “=” in it
                                        28
Hypothesis as Metaphor

✤   Hypothesis tests are like U.S. criminal trials

✤   The judicial system is structured such that the accused person is
    presumed innocent until proven guilty. In such a system the absence
    of convincing evidence (“beyond a reasonable doubt”) results in the
    person being set free.

    ✤   H0: innocent

    ✤   Ha: guilty

                                       29
P-values


✤   In each hypothesis testing situation we will compute a p-value. This is
    the probability that the null hypothesis is correct given the data.

    ✤   Accept H0 if the p-value is large

    ✤   Reject H0 if the p-value is small, go with Ha

✤   How small is small enough? It depends... (usually p < 0.05)



                                        30
Notes on Hypothesis Testing

✤   “Statistical significance” is not the same as “clinical significance”. A
    tiny effect may be “statistically significant” if the sample size is huge.

✤   The p-value does not describe the magnitude of the effect!

✤   When reporting analysis results, a confidence interval should always
    be provided along with the results of a hypothesis test.

✤   The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to
    similar conclusions, in practice they often do not)

✤   Never report results as “p < 0.05”, report the p-value and let the
    reader decide if they agree with your interpretation.
                                         31
• Type I Error: Reject H0 when H0 is actually true.
  – For example, to conclude there is an effect (or a difference)
    when there really isn’t one.
  – Also called “false positive”.
• Type II Error: Accept H0 when H0 is actually false.
  – For example, to fail to find an effect (or a difference) when
    there really is one.
  – Also called “false negative”.

                               State of nature
              Decision     H0 is true   Ha is true
             Accept H0          qh
                                 q       Type II
                                            qh
                                             q
             Reject H0       Type I
                                32
Probabilities of Errors of Type I and Ty
            Probabilities

✤   Each of the errors has an associated probability: associated
                    Each of the errors has an                             probabilit
                         • α = P (Type I Error)
                         • β = P (Type II Error)

✤   Hypothesis testing is set up to control Type I error rate (α)
                      Hypothesis testing is set up to control Type I
        The experimenter chooses α - everything else follows from this!
                      The experimenter chooses α — everything else
✤   Most common (by far) choice for α is 0.05.

    ✤   (Also, 0.01 and 0.10most common
                      The on occasion)           (by far) choice for α is 0.05
                                          33
Comparing Means

✤   Tests:

    ✤   Single group versus a fixed mean

    ✤   Two groups with the same variable

    ✤   Two groups with pairwise observation

✤   Hypotheses:

    ✤   H0 : the two groups have equal means ( mean A = mean B )

    ✤   Ha : the means of the groups are different
                                       34
Assumptions for t-Tests

    ✤   The group (sample) is the Independent Variable (dichotomous)

    ✤   The outcome of interest is the Dependent Variable

✤   t-Tests are only valid if these assumptions are not violated:

    ✤   The research question DOES involve the comparison of 2 means

    ✤   The Dependent Variable is a quantitative scale

    ✤   The distribution of the Dependent Variable is normal

    ✤   Independent Variable assigned randomly (independently)
                                      35
Met Assumptions, but Which Test?


✤   Only one group with data: One-Sample t-Test

✤   Two groups:

    ✤   Not related to each other: Independent-Samples t-Test

    ✤   Related samples (e.g. before & after): Paired-Samples t-Test



                                       36
One-Sample t-Test


✤   Compares a sample mean to a known population mean.

✤   Need to know the population mean!



✤   Example: Is there a difference between the population mean IQ (100)
    and the mean IQ for a sample of 50 John Jay students (125)?



                                    37
Paired-Samples t-Test

✤   Sometimes we have two sets of measurements that are related:

    ✤   Each subject is measured before and after treatment

    ✤   With pairs of identical twins

    ✤   Subject has different treatment on left & right arms



✤   For each observation in one group there is exactly one closely related
    observation in the other groups (can make pairs, one of each group)
                                        38
Independent-Samples t-Test

✤   Compares the means of two groups or samples.

✤   One of the most common situations in statistical inference is that of
    comparing two means from independent samples

    ✤   Clinical trials - treatment group vs. placebo group

    ✤   Exposed vs. unexposed

    ✤   Males vs. females

    ✤   General population vs. specific subpopulation
                                       39
Review: Hypotheses


✤   Null Hypothesis: there is no relationship between the independent
    and dependent variables

✤   p-value: the probability of the null hypothesis (H0) being true

    ✤   Reject H0 if p is too small (usually p < 0.05)

    ✤   If we reject H0, we must instead choose the alternative (Ha)



                                        40
Review: t-Tests

✤   Compare the means of exactly two groups



✤   Only one group (with data) compared to a fixed number:

    ✤   One-Sample t-Test

✤   Two groups (with data):

    ✤   Not related to each other: Independent-Samples t-Test

    ✤   Related samples (e.g. before & after): Paired-Samples t-Test
                                       41

More Related Content

PDF
Semana8 teorema del limite central
PPT
Statistik 1 5 distribusi probabilitas diskrit
PPTX
Probability Distributions for Discrete Variables
PDF
Pruebasdehipotesis semana10
PDF
Lecture 10.4 bt
PPTX
Discrete Probability Distribution Test questions slideshare
PPTX
Probability distribution
Semana8 teorema del limite central
Statistik 1 5 distribusi probabilitas diskrit
Probability Distributions for Discrete Variables
Pruebasdehipotesis semana10
Lecture 10.4 bt
Discrete Probability Distribution Test questions slideshare
Probability distribution

What's hot (20)

PPT
Statistik 1 7 estimasi & ci
PDF
Chapter6
PPT
Sfs4e ppt 06
PPTX
Properties of discrete probability distribution
PDF
Chapter 4 part2- Random Variables
PDF
Semana8 muestreo
PDF
Semana7 dn
PPTX
Binomial distribution
PDF
Chapter 4 part3- Means and Variances of Random Variables
PPT
Bba 3274 qm week 3 probability distribution
PPTX
Discrete Random Variable (Probability Distribution)
DOC
Theory of probability and probability distribution
PPT
04 random-variables-probability-distributionsrv
PDF
Documents.mx eduv
PPTX
Random variables
PPTX
Bernoullis Random Variables And Binomial Distribution
PPTX
Mean, variance, and standard deviation of a Discrete Random Variable
PDF
Hipotesis y muestreo estadístico
PDF
Semana5 modelos
PPTX
Quantitative Techniques random variables
Statistik 1 7 estimasi & ci
Chapter6
Sfs4e ppt 06
Properties of discrete probability distribution
Chapter 4 part2- Random Variables
Semana8 muestreo
Semana7 dn
Binomial distribution
Chapter 4 part3- Means and Variances of Random Variables
Bba 3274 qm week 3 probability distribution
Discrete Random Variable (Probability Distribution)
Theory of probability and probability distribution
04 random-variables-probability-distributionsrv
Documents.mx eduv
Random variables
Bernoullis Random Variables And Binomial Distribution
Mean, variance, and standard deviation of a Discrete Random Variable
Hipotesis y muestreo estadístico
Semana5 modelos
Quantitative Techniques random variables
Ad

Similar to Lecture 11 (20)

PPTX
Machine learning session2
PPTX
Basic statistics for algorithmic trading
PPT
PPTX
Probability
PPTX
REVIEW OF BASIC PROBABILITY (1) (1).pptx
PPTX
Econometrics 2.pptx
PPT
Chapter 05
PPT
Sriram seminar on introduction to statistics
PDF
5-Probability-and-Normal-Distribution.pdf
PPTX
Statistic and Probability definition and terminologies .pptx
PDF
Different types of distributions
DOCX
S t a t i s t i c s
DOCX
S t a t i s t i c s
DOCX
10 Must-Know Statistical Concepts for Data Scientists.docx
PPT
Probability concepts and the normal distribution
DOCX
Module-2_Notes-with-Example for data science
PPTX
RSS probability theory
PPTX
Day 3.pptx
PDF
PROBABILITY.pdfhdfuiufddfg fuf7g8guf5x7giguf7g8g
PPTX
Chapters 14 and 15 presentation
Machine learning session2
Basic statistics for algorithmic trading
Probability
REVIEW OF BASIC PROBABILITY (1) (1).pptx
Econometrics 2.pptx
Chapter 05
Sriram seminar on introduction to statistics
5-Probability-and-Normal-Distribution.pdf
Statistic and Probability definition and terminologies .pptx
Different types of distributions
S t a t i s t i c s
S t a t i s t i c s
10 Must-Know Statistical Concepts for Data Scientists.docx
Probability concepts and the normal distribution
Module-2_Notes-with-Example for data science
RSS probability theory
Day 3.pptx
PROBABILITY.pdfhdfuiufddfg fuf7g8guf5x7giguf7g8g
Chapters 14 and 15 presentation
Ad

Lecture 11

  • 2. 2
  • 3. 3
  • 4. Probability ✤ How we express likelihood mathematically ✤ For an event “A”, the probability of A occurring is denoted “P(A)” ✤ Always number between 0 and 1 ✤ P(A) = 0 means that A never happens ✤ P(A) = 1 means that A always happens 4
  • 5. Independence & Exclusivity ✤ independence - A and B are independent if the occurrence of one does not affect the probability of the other: ✤ P(A|B) = P(A) = P(A|not B) ✤ P(B|A) = P(B) = P(B|not A) ✤ mutually exclusive - A and B are mutually exclusive if it is impossible for both of them to occur: ✤ P(A and B) = 0 5
  • 6. Probability Rules ✤ Probability of not happening is 1 minus probability of occurring ✤ P(not A) = 1 - P(A) ✤ When A and B are independent: ✤ P(A and B) = P(A) × P(B) ✤ P(A or B) = P(A) + P(B) - P(A and B) 6
  • 7. Probability Fundamentals ✤ Sum of probabilities of all possible outcomes is 1 ✤ Flip a coin and you get either heads or tails: ✤ P(heads) + P(tails) = 1 = P(heads or tails) ✤ With mutually exclusive outcomes A, B, C, and D ✤ P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D) 7
  • 8. Conditional Probability ✤ With non-independent events, knowing one has happened may change the likelihood of the other occurring ✤ Conditional probability - what is the probability of A given that B has already happened? ✤ P(A|B) ✤ Bayes Rule for conditional probability: P (A and B) P (B|A) × P (B) P (A|B) = = P (B) P (A) 8
  • 9. Conditional Probability Hoedown ✤ At John Jay, 62.5% of all students hate statistics while 25% of all students hate statistics and passed the class. What is the probability that a student passes stats given that the student hates statistics? ✤ Two fair dice are rolled, what is the (conditional) probability that exactly one die’s value is a 1 or 2 given that they show different numbers? 9
  • 10. Something Really Important ✤ Classic stats problem emerged from the game show Let’s Make a Deal, often called the Monty Hall Problem after the show’s host ✤ Has ended many friendships and caused bitter internet arguments 10
  • 11. The Game ✤ There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors is a fabulous prize that Monty has hidden ✤ You get to choose a door, which may or may not have the prize ✤ Monty opens another door without revealing the prize ✤ You now have the option to stay with your door or switch to another, should you stick with your original choice or switch? 11
  • 12. Choosing The First Door ✤ Three doors and one prize so you’ll pick the right door one out of three times, i.e. P(right first choice) = 1/3 ✤ Likewise, you’ll pick the wrong door with P(wrong first choice) = 2/3 12
  • 13. The Reveal ✤ No matter how you choose, there are two other doors one prize. This means there is at least one of the two unchosen doors with nothing behind it. ✤ Monty knows where the prize is and opens the door that DOESN’T have the prize behind it. ✤ This leaves your door and one other. One of them has the prize and the other doesn’t, should you switch? 13
  • 14. To Switch, or Not To Switch ✤ You don’t know if you have the right door! ✤ What’s the probability that your door has the prize? ✤ What’s the probability that the other door has the prize? ✤ What’s the probability that your door doesn’t have the prize? 14
  • 15. Example of the Game ✤ As an example, the prize is hidden behind door “3”. ✤ If you choose door “3” initially, switching can only lose you the prize ✤ If you choose door “2” initially, Monty must open door “1” and switching will get you the prize ✤ If you choose door “1” initially, Monty must open door “2” and switching will get you the prize 15
  • 16. Switch Already! ✤ Switching is a way of saying “I don’t think the prize is behind this door” ✤ Since the probability is 1/3 that the prize is behind any one door, the probability is 2/3 that the prize is not behind that door ✤ Always switch and you’ll win 2/3’s of the time! 16
  • 17. Expected Values ✤ Probability can be used to estimate rewards in a game of chance ✤ Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ... ✤ Silly coin-flipping game: If you can flip a coin three times and have exactly one Heads, you get a dollar. If not, you give me a dollar. ✤ Should you take the bet? 17
  • 18. Normal Distribution ✤ The distribution is ✤ unimodal ✤ symmetric ✤ “light tailed” ✤ Notation: X ~ N(μ, σ) means “the random variable X has a normal distribution with mean μ and standard deviation σ“ 18
  • 19. Area Under the Curve Equals 1 0.8 N(!3,0.5) N(2,1) N(!1,3) 0.6 f(x) 0.4 0.2 0.0 !4 !2 0 2 4 19
  • 20. Rules of Thumb ✤ P(within one standard deviation) = 0.68 ✤ P(within 1.68 standard deviations) = 0.95 ✤ P(within three standard deviations) = 0.997 ✤ With “real” normal distributions, you just don’t get outliers! 20
  • 21. Standard Normal Distribution ✤ standard normal distribution is the normal distribution with mean μ = 0 and standard deviation σ = 1: Z ~ N(0, 1) ✤ Any normal distribution can be transformed into a standard normal distribution. If X ~ N(μ, σ), then: X −µ =Z σ ✤ 21
  • 22. Z - Scores & the Standard Normal ✤ Each observation has an associated z-score, which is the number of standard deviations that observation is away from the mean ✤ Converting a sample from a normal distribution to z-scores transforms it to a standard normal distribution ✤ z-score = (observation - mean) ÷ standard deviation ✤ If the observation is above the mean then the Z-score is positive, if below then the Z-score is negative 22
  • 23. Interval Estimation ✤ We might estimate the mean for an entire population using the mean for a small sample, this is called a point estimate. ✤ A confidence interval gives a range of “plausible” values for the population mean ✤ Usually reported as "mean ± wiggle room" ✤ Each interval has an associated level of confidence, usually written as a percent (95% being the most common) ✤ "I am 95% confident that the population mean is in this range, with the sample mean being the most likely guess" 23
  • 24. Two-Sided: 1.96 Std. Dev.’s 24
  • 25. Normal Critical Deviates the point for which the area und ht is γ. how many you wanted to find the middle X% of to travel Critical normal deviate: If ✤ distribution, standard deviations would you have the in each direction. ✤ Define zγ to be the point for which the area under the normal curve to matical notation, zγ is the point f the right is γ. ✤ In more mathematical notation, zγ is the point for which: P (Z > zγ ) = γ, 25
  • 26. Interpreting Confidence Intervals ✤ The width of a confidence interval indicates precision ✤ An observation's z-score can test if an observation is similar to others, bigger than ±1.96 means 95% likely to be different ✤ 95% confidence intervals are by far the most common, but any level of confidence interval can be computed: ✤ 90%: mean ± (1.645 × standard deviation) ✤ 95%: mean ± (1.96 × standard deviation) ✤ 99%: mean ± (2.58 × standard deviation) 26
  • 27. Components of Confidence ✤ How might a confidence interval change as: ✤ Ȳ increases ✤ σ increases ✤ n increases ✤ the confidence level increases (e.g., from 95% to 99%) 27
  • 28. Conflicting Hypotheses ✤ In statistical inference, there are always two conflicting hypotheses: ✤ null hypothesis “H0” - often states “no effect” or “no difference”. This is the hypothesis that we will assume to be true unless we have convincing evidence to the contrary. ✤ alternative hypothesis “H1” or “Ha” - The hypothesis that we will believe only if the evidence strongly supports it. ✤ The null hypothesis typically has “=” in it 28
  • 29. Hypothesis as Metaphor ✤ Hypothesis tests are like U.S. criminal trials ✤ The judicial system is structured such that the accused person is presumed innocent until proven guilty. In such a system the absence of convincing evidence (“beyond a reasonable doubt”) results in the person being set free. ✤ H0: innocent ✤ Ha: guilty 29
  • 30. P-values ✤ In each hypothesis testing situation we will compute a p-value. This is the probability that the null hypothesis is correct given the data. ✤ Accept H0 if the p-value is large ✤ Reject H0 if the p-value is small, go with Ha ✤ How small is small enough? It depends... (usually p < 0.05) 30
  • 31. Notes on Hypothesis Testing ✤ “Statistical significance” is not the same as “clinical significance”. A tiny effect may be “statistically significant” if the sample size is huge. ✤ The p-value does not describe the magnitude of the effect! ✤ When reporting analysis results, a confidence interval should always be provided along with the results of a hypothesis test. ✤ The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to similar conclusions, in practice they often do not) ✤ Never report results as “p < 0.05”, report the p-value and let the reader decide if they agree with your interpretation. 31
  • 32. • Type I Error: Reject H0 when H0 is actually true. – For example, to conclude there is an effect (or a difference) when there really isn’t one. – Also called “false positive”. • Type II Error: Accept H0 when H0 is actually false. – For example, to fail to find an effect (or a difference) when there really is one. – Also called “false negative”. State of nature Decision H0 is true Ha is true Accept H0 qh q Type II qh q Reject H0 Type I 32
  • 33. Probabilities of Errors of Type I and Ty Probabilities ✤ Each of the errors has an associated probability: associated Each of the errors has an probabilit • α = P (Type I Error) • β = P (Type II Error) ✤ Hypothesis testing is set up to control Type I error rate (α) Hypothesis testing is set up to control Type I The experimenter chooses α - everything else follows from this! The experimenter chooses α — everything else ✤ Most common (by far) choice for α is 0.05. ✤ (Also, 0.01 and 0.10most common The on occasion) (by far) choice for α is 0.05 33
  • 34. Comparing Means ✤ Tests: ✤ Single group versus a fixed mean ✤ Two groups with the same variable ✤ Two groups with pairwise observation ✤ Hypotheses: ✤ H0 : the two groups have equal means ( mean A = mean B ) ✤ Ha : the means of the groups are different 34
  • 35. Assumptions for t-Tests ✤ The group (sample) is the Independent Variable (dichotomous) ✤ The outcome of interest is the Dependent Variable ✤ t-Tests are only valid if these assumptions are not violated: ✤ The research question DOES involve the comparison of 2 means ✤ The Dependent Variable is a quantitative scale ✤ The distribution of the Dependent Variable is normal ✤ Independent Variable assigned randomly (independently) 35
  • 36. Met Assumptions, but Which Test? ✤ Only one group with data: One-Sample t-Test ✤ Two groups: ✤ Not related to each other: Independent-Samples t-Test ✤ Related samples (e.g. before & after): Paired-Samples t-Test 36
  • 37. One-Sample t-Test ✤ Compares a sample mean to a known population mean. ✤ Need to know the population mean! ✤ Example: Is there a difference between the population mean IQ (100) and the mean IQ for a sample of 50 John Jay students (125)? 37
  • 38. Paired-Samples t-Test ✤ Sometimes we have two sets of measurements that are related: ✤ Each subject is measured before and after treatment ✤ With pairs of identical twins ✤ Subject has different treatment on left & right arms ✤ For each observation in one group there is exactly one closely related observation in the other groups (can make pairs, one of each group) 38
  • 39. Independent-Samples t-Test ✤ Compares the means of two groups or samples. ✤ One of the most common situations in statistical inference is that of comparing two means from independent samples ✤ Clinical trials - treatment group vs. placebo group ✤ Exposed vs. unexposed ✤ Males vs. females ✤ General population vs. specific subpopulation 39
  • 40. Review: Hypotheses ✤ Null Hypothesis: there is no relationship between the independent and dependent variables ✤ p-value: the probability of the null hypothesis (H0) being true ✤ Reject H0 if p is too small (usually p < 0.05) ✤ If we reject H0, we must instead choose the alternative (Ha) 40
  • 41. Review: t-Tests ✤ Compare the means of exactly two groups ✤ Only one group (with data) compared to a fixed number: ✤ One-Sample t-Test ✤ Two groups (with data): ✤ Not related to each other: Independent-Samples t-Test ✤ Related samples (e.g. before & after): Paired-Samples t-Test 41