SlideShare a Scribd company logo
HYPOTHESIS TESTING:
Inference on Proportions I
Dr. O. J. AKINSOLA
Department of Community Health and Primary Care,
College of Medicine
University of Lagos
LAGOS
Recall that the choice of a test statistic is dictated by
the followings:
 the study objectives
 the type of data usually indicated by the level of
measurements
 the type of study design (sample selection methods,
data collection techniques, matching, dependent and
independent samples, etc)
 the power of the test required to detect a true
difference
This depends on assumptions underlying the statistical
model for the test statistics. Remember also that the
difference between power of a test and the p-value can
be explained as follows.
The p-value is a measure of the risk we are willing to
take that our decision with respect to the null hypothesis
may be incorrect. Whereas, the ability of a study to
detect a difference between two or more treatment
groups if the alternative hypothesis were true, is the
Power of the test. This power depends on the magnitude
of the difference to be detected, the sample size, the
variation in the measurements and the level of
significance (p-value).
Company Induction process and Onboarding
 Chi-Square Test of Independence
 Chi-square Goodness of Fit Test
 Fisher’s Exact Test
 McNemar’s Test
Four Most Common Statistical Tests For
Categorical Data
 This test checks whether two categorical
variables are independent or related, making it
useful to determine if knowing one category
affects the other.
Chi-Square Test of Independence
 The test compares observed data with
expected data to see if they match. It’s used to
test how well the data fits a particular
distribution.
Chi-Square Goodness of Fit Test
 It is accurate for small sample sizes and test
the relationship between two categorical
variables, similar to Chi-Square Test of
Independence but better suited for small data.
Fisher’s Exact Test
 This test specifically designed for paired data, often used
in before-and-after scenarios to detect changes in
proportions within the same group over time.
 McNemar’s Test is a test on a 2x2 contingency table. It
checks the marginal homogeneity of
two dichotomous variables.
 It is used for data of the two groups coming from the
same participants, i.e. paired data.
 Paired data usually arise through matching, which
increased the validity by controlling confounders.
 For example, it is used to analyze tests performed before
and after treatment in a population.
McNemar’s Test
For categorical data, the chi-square test is useful to
test the significance of association between any two
categorical variables. Recall that nominal data
expressed as proportions can be compared using the
Z-test for proportions.
The procedure is to estimate the difference between
the two proportions and divide this by standard error
of the difference to obtain the critical value of Z
which are later referred to the Z-table to obtain the
p-value. At the 5% level of significance, the critical
value is 1.96. So that any
Chi-Square Test of Independence
calculated value beyond Z=1.96 is considered falling
in the critical region and therefore leads to concluding
the difference observed is for statistical significance.
A statistical test for the same purpose is the Chi-
square test. This test demands that the data is
expressed as frequencies rather than proportions and
presented in contingency tables. The size of a
contingency table is determined by the number of
categories of each classifying nominal variables. A
special characteristics of the Chi-square test is its
dependence on the degrees of freedom. In fact the
chi-square distribution approaches the normal
distribution with increase in degrees of freedom. The
r x c contingency table which has r rows and c
columns has (r-1) x (c-1) degrees of freedom. The
area under a standard chi-square table has been
calculated and tabulated at corresponding degrees of
freedom.
Once the chi-square test statistic is evaluated, we
only need to read off the significance level directly
from this table or by comparing computed value with
a calculated value at a chosen level of significance.
If 14 out of the 60 patients seen in a rural clinic in the
Northern region of Nigeria in one day had irritable
bowel syndrome (IBS), and 4 out of 50 patients seen
in another rural clinic on the same day in the
Southern region had irritable bowel syndrome.
Can we say the numbers with irritable bowel
syndrome is contingent on the location of the clinic?
The result is presented in the 2 x 2 table as follws:
Example 1
IBS
REGION
Yes No TOTAL
Northern 14 46 60
Southern 4 46 50
TOTAL 18 92 110
The procedure to test the association between IBS
and REGION follows the steps.
Step 1: Ho: There is no association between Region
and presentation with irritable bowel syndrome.
Step 2: HA: There is an association between Region
and presentation with irritable bowel syndrome.
Step 3: Level of significance (α = 5%)
Step 4: χ2
= 


n
i Ei
Ei
Oi
1
2
)
(
Where Oi are observed frequencies and Ei are
expected frequencies in the cells of the table if the
null hypothesis were true. We now need to calculate
these expected frequencies in each cell of the 2 x 2
table in the Null hypothesis were true.
i.e. Ei =
T
T
T
G
xC
R
Oi Ei (Oi-Ei)2
14 9.82 17.47 1.77
46 50.18 17.47 0.35
4 8.18 17.47 2.14
46 41.82 17.47 0.42
TOTAL - - 4.68
Ei
Ei
Oi 2
)
( 
χ2
= 4.68 on 1 degree of freedom. The tabulated chi-
square value on 1 degree of freedom at 5% level is
3.84. We can see that the calculated chi-square value
is greater than the tabulated value at the 5% level.
That is, 4.68 > 3.84. Therefore, p < 0.05
We reject the null hypothesis that there is no significant
difference in the rate of irritable bowel syndrome
between East and South. In other words, there is a
significant association between REGION and IBS.
Conclusion
In a study of the potential role of drug therapy in the
treatment of bladder instability in the elderly, 19
incontinent elderly patients received Imipramine and
14 received a placebo treatment. Of the Imipramine
patients, 14 became dry after treatment compared
with only 6 of the placebo patients.
 Show the data in an appropriate table
 Is there any evidence of genuine treatment
differences?
Example 2
The chi-square test is also used to determine in
quantitative terms if a set of observations follow a
particular probability distribution. We compare the
observed frequencies of the individuals with the
given attribute with what we would have expected if
certain theory or probability distribution had held.
Chi-Square Goodness of Fit Test
The data below gave the number of spontaneous
abortions suffered by a sample of 71 women who had
been pregnant four times. If the risk of abortions were
independent of previous reproductive history and the
same for all women, the number of abortions out of
four pregnancies would follow a binomial distribution.
Show that the number of abortions out of four
pregnancies follow a binomial distribution.
Example 1
Number of spontaneous abortions
0 1 2 3 4
No of women
of observed
24 29 7 5 6
Since the total number of abortions is 82, the
probability of abortion is
= 0.292
Using Pr= n
cxpx
qn-x
, the expected probabilities of
0,1,2,3,4, spontaneous abortions are:
71
4
82
x
Pr (0 abortion) = (1-0.292)4
= 0.251
Pr (1 abortion) = 4x0.292x(1-0.292)3
= 0.415
Pr (2 abortions) = 6x(0.292)2
(1-0.292)2
= 0.256
Pr (3 abortions) = 4x(0.292)3 (1-0.292) = 0.071
Pr (4 abortions) = (0.292)4 = 0.007
Number of abortions Observed Expected
0 24 17.821
1 29 29.465
2 7 18.176
3 5 5.041
4 6 0.497
TOTAL 71 71
χ2
= 69.951. The critical value is 5% at n-1 degree of freedom
which gives 7.81. This calculated value on 3 degrees of
freedom exceed the critical value at a 5% probability level.
Hence, there is a strong evidence against the goodness of fit of
the binomial distribution to the observed distribution of
spontaneous abortions, p<0.001. In other words, we reject the
null hypothesis in favour of the alternative hypothesis.
Population statistics indicate that the chances of a newborn
being male are 0.52. A survey of 50 quadruplet births yielded
the following pattern of gender outcomes.
0 male 5
1 male 14
2 males 14
3 males 10
4 males 7
Test if these results follow a Binomial distribution. What does
this imply about the genders of the children in quadruplet
births?
Example 2
 Procedure:
1. Enter
 a) Value of 2x2 contingency table tabulating the
outcomes of 2 tests
 b) Value of 1-α, the two-sided confidence level
2. Click the button “Calculate” to obtain
a) Test statistic and p-values (1 tail and 2 tails) of
McNemar’s Test
b) Odds Ratio
 3. Click the button “Reset” for another new calculation
McNemar’s Test
Test 2 Positive Test 2 Negative Totals
Test 1 Positive a b n1 = a+b
Test 1 Negative c d n2= c+d
Totals m1 = a+c m2 = b+d N = n1+ n2
The null hypothesis is
The alternative hypothesis is
The McNemar’s test statistic with Yates' continuity correction is:
, with degree of freedom =1
The Odds Ratio is
Notation:
100(1-α)% confidence interval: We are 100(1-α)% confident that the true value of the parameter is
included in the confidence interval
: The z-value for standard normal distribution with left-tail probability
 To determine whether a drug influences the
disease, the result of diagnosis before and after the
treatment is tabulated on a 2x2 contingency table.
Example 1
After: Positive After: Negative Totals
Before: Positive 7 13 20
Before: Negative 1 8 9
Totals 8 21 29
Then the test statistic is 8.64286 and the 1-tail and 2-tails p-values are
0.00164 and 0.00328 respectively.
Therefore, the null hypothesis is rejected with 5% significance level.
The Odds Ratio is 13.
 A study was carried out on post-menopausal women
in City A. Cases of women with endometrial cancer
were identified from this city. A control group was
selected matched to the case on age and length of
residence in city A. The medical question was whether
endometrial cancer was related to estrogen use.
Example 2
Estrogen
(Control)
No Estrogen
(Control)
Totals
Estrogen (Cases) 27 29 56
No Estrogen (Cases) 3 4 7
Totals 30 33 63
 The test statistic is 19.53 and the 2-tails p-value is
<0.00001. Therefore, the null hypothesis is rejected at 5%
significance level.
 These data show a statistically significant association
between estrogen use and endometrial cancer (p<0.0001).
 The odds of endometrial cancer is approximately 10 times
greater for women who were on estrogen therapy compared
to those who were not.
THANK YOU !!!

More Related Content

PPTX
Test of significance
PPTX
Chi square
PPTX
Chi square(hospital admin) A
DOCX
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
PPTX
Parametric & non parametric
PPT
Chapter 4(2) Hypothesisi Testing
PPT
Chapter 4(2) Hypothesisi Testing
Test of significance
Chi square
Chi square(hospital admin) A
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
Parametric & non parametric
Chapter 4(2) Hypothesisi Testing
Chapter 4(2) Hypothesisi Testing

Similar to Company Induction process and Onboarding (20)

PPT
Chapter 4(2) Hypothesisi Testing
PPT
t-test and one way ANOVA.ppt game.ppt
PPTX
Test of-significance : Z test , Chi square test
PPTX
Chi square test
PPT
Sample size calculation final
PPTX
Goodness of Fit Notation
PDF
202003241550010409rajeev_pandey_Non-Parametric.pdf
DOCX
Non parametrics tests
PPTX
Chisquare
PPTX
Avoid overfitting in precision medicine: How to use cross-validation to relia...
PDF
inferentialstatistics-210411214248.pdf
PDF
ANOVA_PDF.pdf biostatistics course material
PPTX
Inferential statistics
PPTX
Parmetric and non parametric statistical test in clinical trails
PDF
spss
PPTX
slides Testing of hypothesis.pptx
PPTX
Chi square test
PPTX
Chi square(hospital admin)
PPTX
Case control study - Part 2
PPTX
Intro to tests of significance qualitative
Chapter 4(2) Hypothesisi Testing
t-test and one way ANOVA.ppt game.ppt
Test of-significance : Z test , Chi square test
Chi square test
Sample size calculation final
Goodness of Fit Notation
202003241550010409rajeev_pandey_Non-Parametric.pdf
Non parametrics tests
Chisquare
Avoid overfitting in precision medicine: How to use cross-validation to relia...
inferentialstatistics-210411214248.pdf
ANOVA_PDF.pdf biostatistics course material
Inferential statistics
Parmetric and non parametric statistical test in clinical trails
spss
slides Testing of hypothesis.pptx
Chi square test
Chi square(hospital admin)
Case control study - Part 2
Intro to tests of significance qualitative
Ad

Recently uploaded (20)

PPTX
OnePlus 13R – ⚡ All-Rounder King Performance: Snapdragon 8 Gen 3 – same as iQ...
PPTX
OCCULAR MANIFESTATIONS IN LEPROSY.pptx bbb
PDF
Sales and Distribution Managemnjnfijient.pdf
PDF
Biography of Mohammad Anamul Haque Nayan
PPTX
Surgical thesis protocol formation ppt.pptx
PDF
Manager Resume for R, CL & Applying Online.pdf
PPTX
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
DOCX
mcsp232projectguidelinesjan2023 (1).docx
PDF
313302 DBMS UNIT 1 PPT for diploma Computer Eng Unit 2
PDF
MCQ Practice CBT OL Official Language 1.pptx.pdf
PPT
BCH3201 (Enzymes and biocatalysis)-JEB (1).ppt
PPTX
Principles of Inheritance and variation class 12.pptx
PPTX
Nervous_System_Drugs_PPT.pptxXXXXXXXXXXXXXXXXX
DOCX
How to Become a Criminal Profiler or Behavioural Analyst.docx
PPTX
1-4 Chaptedjkfhkshdkfjhalksjdhfkjshdljkfhrs.pptx
PPTX
Cerebral_Palsy_Detailed_Presentation.pptx
PDF
Prostaglandin E2.pdf orthoodontics op kharbanda
PDF
HR Jobs in Jaipur: 2025 Trends, Banking Careers & Smart Hiring Tools
PPT
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
PDF
Understanding the Rhetorical Situation Presentation in Blue Orange Muted Il_2...
OnePlus 13R – ⚡ All-Rounder King Performance: Snapdragon 8 Gen 3 – same as iQ...
OCCULAR MANIFESTATIONS IN LEPROSY.pptx bbb
Sales and Distribution Managemnjnfijient.pdf
Biography of Mohammad Anamul Haque Nayan
Surgical thesis protocol formation ppt.pptx
Manager Resume for R, CL & Applying Online.pdf
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
mcsp232projectguidelinesjan2023 (1).docx
313302 DBMS UNIT 1 PPT for diploma Computer Eng Unit 2
MCQ Practice CBT OL Official Language 1.pptx.pdf
BCH3201 (Enzymes and biocatalysis)-JEB (1).ppt
Principles of Inheritance and variation class 12.pptx
Nervous_System_Drugs_PPT.pptxXXXXXXXXXXXXXXXXX
How to Become a Criminal Profiler or Behavioural Analyst.docx
1-4 Chaptedjkfhkshdkfjhalksjdhfkjshdljkfhrs.pptx
Cerebral_Palsy_Detailed_Presentation.pptx
Prostaglandin E2.pdf orthoodontics op kharbanda
HR Jobs in Jaipur: 2025 Trends, Banking Careers & Smart Hiring Tools
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
Understanding the Rhetorical Situation Presentation in Blue Orange Muted Il_2...
Ad

Company Induction process and Onboarding

  • 1. HYPOTHESIS TESTING: Inference on Proportions I Dr. O. J. AKINSOLA Department of Community Health and Primary Care, College of Medicine University of Lagos LAGOS
  • 2. Recall that the choice of a test statistic is dictated by the followings:  the study objectives  the type of data usually indicated by the level of measurements  the type of study design (sample selection methods, data collection techniques, matching, dependent and independent samples, etc)  the power of the test required to detect a true difference
  • 3. This depends on assumptions underlying the statistical model for the test statistics. Remember also that the difference between power of a test and the p-value can be explained as follows. The p-value is a measure of the risk we are willing to take that our decision with respect to the null hypothesis may be incorrect. Whereas, the ability of a study to detect a difference between two or more treatment groups if the alternative hypothesis were true, is the
  • 4. Power of the test. This power depends on the magnitude of the difference to be detected, the sample size, the variation in the measurements and the level of significance (p-value).
  • 6.  Chi-Square Test of Independence  Chi-square Goodness of Fit Test  Fisher’s Exact Test  McNemar’s Test Four Most Common Statistical Tests For Categorical Data
  • 7.  This test checks whether two categorical variables are independent or related, making it useful to determine if knowing one category affects the other. Chi-Square Test of Independence
  • 8.  The test compares observed data with expected data to see if they match. It’s used to test how well the data fits a particular distribution. Chi-Square Goodness of Fit Test
  • 9.  It is accurate for small sample sizes and test the relationship between two categorical variables, similar to Chi-Square Test of Independence but better suited for small data. Fisher’s Exact Test
  • 10.  This test specifically designed for paired data, often used in before-and-after scenarios to detect changes in proportions within the same group over time.  McNemar’s Test is a test on a 2x2 contingency table. It checks the marginal homogeneity of two dichotomous variables.  It is used for data of the two groups coming from the same participants, i.e. paired data.  Paired data usually arise through matching, which increased the validity by controlling confounders.  For example, it is used to analyze tests performed before and after treatment in a population. McNemar’s Test
  • 11. For categorical data, the chi-square test is useful to test the significance of association between any two categorical variables. Recall that nominal data expressed as proportions can be compared using the Z-test for proportions. The procedure is to estimate the difference between the two proportions and divide this by standard error of the difference to obtain the critical value of Z which are later referred to the Z-table to obtain the p-value. At the 5% level of significance, the critical value is 1.96. So that any Chi-Square Test of Independence
  • 12. calculated value beyond Z=1.96 is considered falling in the critical region and therefore leads to concluding the difference observed is for statistical significance. A statistical test for the same purpose is the Chi- square test. This test demands that the data is expressed as frequencies rather than proportions and presented in contingency tables. The size of a contingency table is determined by the number of categories of each classifying nominal variables. A special characteristics of the Chi-square test is its dependence on the degrees of freedom. In fact the
  • 13. chi-square distribution approaches the normal distribution with increase in degrees of freedom. The r x c contingency table which has r rows and c columns has (r-1) x (c-1) degrees of freedom. The area under a standard chi-square table has been calculated and tabulated at corresponding degrees of freedom. Once the chi-square test statistic is evaluated, we only need to read off the significance level directly from this table or by comparing computed value with a calculated value at a chosen level of significance.
  • 14. If 14 out of the 60 patients seen in a rural clinic in the Northern region of Nigeria in one day had irritable bowel syndrome (IBS), and 4 out of 50 patients seen in another rural clinic on the same day in the Southern region had irritable bowel syndrome. Can we say the numbers with irritable bowel syndrome is contingent on the location of the clinic? The result is presented in the 2 x 2 table as follws: Example 1
  • 15. IBS REGION Yes No TOTAL Northern 14 46 60 Southern 4 46 50 TOTAL 18 92 110
  • 16. The procedure to test the association between IBS and REGION follows the steps. Step 1: Ho: There is no association between Region and presentation with irritable bowel syndrome. Step 2: HA: There is an association between Region and presentation with irritable bowel syndrome. Step 3: Level of significance (α = 5%) Step 4: χ2 =    n i Ei Ei Oi 1 2 ) (
  • 17. Where Oi are observed frequencies and Ei are expected frequencies in the cells of the table if the null hypothesis were true. We now need to calculate these expected frequencies in each cell of the 2 x 2 table in the Null hypothesis were true. i.e. Ei = T T T G xC R
  • 18. Oi Ei (Oi-Ei)2 14 9.82 17.47 1.77 46 50.18 17.47 0.35 4 8.18 17.47 2.14 46 41.82 17.47 0.42 TOTAL - - 4.68 Ei Ei Oi 2 ) ( 
  • 19. χ2 = 4.68 on 1 degree of freedom. The tabulated chi- square value on 1 degree of freedom at 5% level is 3.84. We can see that the calculated chi-square value is greater than the tabulated value at the 5% level. That is, 4.68 > 3.84. Therefore, p < 0.05
  • 20. We reject the null hypothesis that there is no significant difference in the rate of irritable bowel syndrome between East and South. In other words, there is a significant association between REGION and IBS. Conclusion
  • 21. In a study of the potential role of drug therapy in the treatment of bladder instability in the elderly, 19 incontinent elderly patients received Imipramine and 14 received a placebo treatment. Of the Imipramine patients, 14 became dry after treatment compared with only 6 of the placebo patients.  Show the data in an appropriate table  Is there any evidence of genuine treatment differences? Example 2
  • 22. The chi-square test is also used to determine in quantitative terms if a set of observations follow a particular probability distribution. We compare the observed frequencies of the individuals with the given attribute with what we would have expected if certain theory or probability distribution had held. Chi-Square Goodness of Fit Test
  • 23. The data below gave the number of spontaneous abortions suffered by a sample of 71 women who had been pregnant four times. If the risk of abortions were independent of previous reproductive history and the same for all women, the number of abortions out of four pregnancies would follow a binomial distribution. Show that the number of abortions out of four pregnancies follow a binomial distribution. Example 1
  • 24. Number of spontaneous abortions 0 1 2 3 4 No of women of observed 24 29 7 5 6
  • 25. Since the total number of abortions is 82, the probability of abortion is = 0.292 Using Pr= n cxpx qn-x , the expected probabilities of 0,1,2,3,4, spontaneous abortions are: 71 4 82 x
  • 26. Pr (0 abortion) = (1-0.292)4 = 0.251 Pr (1 abortion) = 4x0.292x(1-0.292)3 = 0.415 Pr (2 abortions) = 6x(0.292)2 (1-0.292)2 = 0.256 Pr (3 abortions) = 4x(0.292)3 (1-0.292) = 0.071 Pr (4 abortions) = (0.292)4 = 0.007
  • 27. Number of abortions Observed Expected 0 24 17.821 1 29 29.465 2 7 18.176 3 5 5.041 4 6 0.497 TOTAL 71 71
  • 28. χ2 = 69.951. The critical value is 5% at n-1 degree of freedom which gives 7.81. This calculated value on 3 degrees of freedom exceed the critical value at a 5% probability level. Hence, there is a strong evidence against the goodness of fit of the binomial distribution to the observed distribution of spontaneous abortions, p<0.001. In other words, we reject the null hypothesis in favour of the alternative hypothesis.
  • 29. Population statistics indicate that the chances of a newborn being male are 0.52. A survey of 50 quadruplet births yielded the following pattern of gender outcomes. 0 male 5 1 male 14 2 males 14 3 males 10 4 males 7 Test if these results follow a Binomial distribution. What does this imply about the genders of the children in quadruplet births? Example 2
  • 30.  Procedure: 1. Enter  a) Value of 2x2 contingency table tabulating the outcomes of 2 tests  b) Value of 1-α, the two-sided confidence level 2. Click the button “Calculate” to obtain a) Test statistic and p-values (1 tail and 2 tails) of McNemar’s Test b) Odds Ratio  3. Click the button “Reset” for another new calculation McNemar’s Test
  • 31. Test 2 Positive Test 2 Negative Totals Test 1 Positive a b n1 = a+b Test 1 Negative c d n2= c+d Totals m1 = a+c m2 = b+d N = n1+ n2
  • 32. The null hypothesis is The alternative hypothesis is The McNemar’s test statistic with Yates' continuity correction is: , with degree of freedom =1 The Odds Ratio is Notation: 100(1-α)% confidence interval: We are 100(1-α)% confident that the true value of the parameter is included in the confidence interval : The z-value for standard normal distribution with left-tail probability
  • 33.  To determine whether a drug influences the disease, the result of diagnosis before and after the treatment is tabulated on a 2x2 contingency table. Example 1 After: Positive After: Negative Totals Before: Positive 7 13 20 Before: Negative 1 8 9 Totals 8 21 29 Then the test statistic is 8.64286 and the 1-tail and 2-tails p-values are 0.00164 and 0.00328 respectively. Therefore, the null hypothesis is rejected with 5% significance level. The Odds Ratio is 13.
  • 34.  A study was carried out on post-menopausal women in City A. Cases of women with endometrial cancer were identified from this city. A control group was selected matched to the case on age and length of residence in city A. The medical question was whether endometrial cancer was related to estrogen use. Example 2 Estrogen (Control) No Estrogen (Control) Totals Estrogen (Cases) 27 29 56 No Estrogen (Cases) 3 4 7 Totals 30 33 63
  • 35.  The test statistic is 19.53 and the 2-tails p-value is <0.00001. Therefore, the null hypothesis is rejected at 5% significance level.  These data show a statistically significant association between estrogen use and endometrial cancer (p<0.0001).  The odds of endometrial cancer is approximately 10 times greater for women who were on estrogen therapy compared to those who were not.