SlideShare a Scribd company logo
MAS230 : Biostatistical Methods
                                Workshop 4




1. Consider Data Set 75. These data come from Judge, M.D. et al (1984). “Thermal
   shrinkage temperature of intramuscular collagen of bulls and steers,” Journal of
   Animal Science 59: 706–9, and are reproduced in Samuels and Witmer (1999),
   Statistics for Life Sciences, 2nd Edition, Prentice Hall, p. 357. The study is designed
   to assess the effect of electrical stimulation of a beef carcass in terms of improving
   the tenderness of the meat. In this test, beef carcasses were split in half. One
   side was subjected to a brief electrical current while the other was an untreated
   control. From each side a specimen of connective tissue (collagen) was taken, and
   the temperature at which shrinkage occurred was determined. Increased tenderness
   is related to a low shrinkage temperature.
   Carry out analyses to assess the impact of electrical stimulation on the meat tender-
   ness. Use both parametric and non-parametric methods and compare the results.
   Suppose acceptable tenderness corresponds to a shrinkage temperature less than
   69 degrees. How would you test to see if the proportions of acceptable tenderness
   values differed under the two treatments? Use SPSS to create appropriate variables
   to enable this to be tested and carry out the analysis. Don’t forget that the sample
   sizes are small here.


   To assess the impact of electrical stimulation on meat tenderness, we will examine
   whether the mean temperature at which shrinkage occurs is different for the two
   groups. Since one half of each carcass is assigned to the untreated group, and
   the other half is assigned to the group that receives electrical stimulation, the two
   samples are not independent. Consequently, the most appropriate analysis is a test
   for paired differences.
   We might consider a t-test for paired differences. This test is based on the hypothe-
   ses

                                        H0 : δ = 0
                                        H1 : δ = 0

   where δ represents the mean difference in temperature at which shrinkage occurs for
   the untreated carcass halves and the carcass halves treated with electrical stimula-
   tion. To carry out a t-test for paired differences the carcasses must be independent.


                                           1
Without information about how the carcasses were obtained, we will assume that
     this is the case. Additionally, the differences in temperature at which shrinkage
     occurs must come from an approximately normal distribution. A Q-Q plot for dif-
     ferences in temperature at which shrinkage occurs is given in Figure 1. The observed




      Figure 1: Q-Q plot of differences in temperature at which shrinkage occurs.

     differences fall quite close to the line, suggesting that the assumption of normality
     of the differences is plausible, and we proceed with the t-test. Output for this test is
     given in Figure 2. The test statistic is given by t = 0.404 on 14 degrees of freedom.




Figure 2: Output for t-test for mean difference in temperature at which shrinkage occurs
for untreated carcass halves and electrically stimulated carcass halves.

     The corresponding p-value of 0.692 is not significant for any reasonable significance
     level, so we do not reject the null hypothesis, meaning that we have insufficient
     evidence to suggest that treating carcasses with electrical stimulation produces any
     difference in the mean temperature at which shrinkage occurs.
     If the assumptions of the t-test had not met, we might have considered a sign
     test or Wilcoxon signed-rank test. Even though the t-test is the appropriate test,


                                            2
here I provide analyses using both the sign test and Wilcoxon signed-rank test for
illustrative purposes. For both the sign test and Wilcoxon signed-rank test, we
would test the hypotheses

                                   H0 : µ = 0
                                        ˜
                                   H1 : µ = 0
                                        ˜

where µ represents the median difference in temperature at which shrinkage oc-
        ˜
curs for the untreated carcass halves and the carcass halves treated with elec-
trical stimulation. To carry out both tests, the carcasses must be independent.
We already addressed this requirement/assumption for the t-test. To carry out a
Wilcoxon signed-rank test, the differences would need to be approximately sym-
metric. Both a histogram and boxplot for differences are given in Figure 3. Neither
show strong evidence to suggest that differences are asymmetric, so either a sign
test or a Wilcoxon-signed rank test can be carried out. Output for the two tests
are given in Figure 4. The p-values for both tests (0.791 for the sign test and 0.801
for the Wilcoxon signed-rank test) exceed any reasonable significance level, so we
would not reject the null hypothesis, meaning that we have insufficient evidence to
conclude that there is a difference in the median temperature at which shrinkage
occurs for untreated carcass halves and electrically stimulated carcass halves.
For the secondary analysis of interest, we must convert temperatures at which
shrinkage occurs to binary variables, where a ‘1’ denotes a shrinkage temperature
less than 69 degrees, and a ‘0’ denotes a shrinkage temperature of 69 degrees or
more. (Note that we could also let ‘0’ denote a shrinkage temperature less than
69 degrees and ‘1’ denote a shrinkage temperature of 69 degrees or more, and our
analysis would produce identical results.) Our two samples are still dependent, and
we would like to determine whether or not the proportions of acceptable carcass
halves differs under the two treatments, so we are restricted to either a kappa test
or McNemar’s test. McNemar’s test is the appropriate test here, as we would like to
see whether or not the proportion of acceptable carcass halves changes after apply-
ing a treatment (electrical stimulation). Since the independence assumption for this
test was already addressed when we carried out the t-test, we proceed with the test.
Output for the test is given in Figure 5. The p-value of approximately 1 means that
we will not reject the null hypothesis for any reasonable significance level, meaning
that we do not have evidence to suggest that there is a difference in the proportion
of acceptable carcass halves for untreated carcass halves and electrically stimulated
carcass halves.




                                      3
Figure 3: Histogram (top) and boxplot (bottom) for difference in temperature at which
shrinkage occurs for the untreated carcass halves and the carcass halves treated with
electrical stimulation.




                                         4
Figure 4: Output for sign test (left) and Wilcoxon signed-rank test (right) for median
temperature at which shrinkage occurs for untreated carcass halves and electrically stim-
ulated carcass halves.




Figure 5: Output for McNemar’s test for difference in proportion of acceptable carcass
halves for untreated carcass halves and electrically stimulated carcass halves.




                                           5
2. Refer to Data Set 76. These data come from Mochizuki, M. et al (1984). “Effects
   of smoking on fetoplacentalmaternal system during pregnancy,” American J. Ob-
   stet. Gyn. 149: 13–20. The study considered the effects of smoking during preg-
   nancy by examining the placentas from 58 women after childbirth. Each mother was
   classified as a non-, moderate or heavy smoker during pregnancy, and the outcome
   measure was presence or absence of atrophied placental villi, finger-like structures
   that protrude from the wall to increase absorption area.
   Combine the two smoking classes to create a “smoker” class and carry out an ap-
   propriate test for association of villi atrophy with smoking status. Given there are
   three ordered classes of smoking (none < moderate < heavy) think about how you
   might display such data.


   We can consider either of two different statistical tests, both of which will produce
   identical results. First, if villi atrophy is associated with smoking status, that means
   that the probability of villi atrophy changes with a person’s smoking status. Thus,
   we can carry out a test to determine if the proportion of smokers with villi atrophy
   is the same as the proportion of non-smokers with villi atrophy. To do this, we
   consider the statistical hypotheses

                                     H0 : πs = πn.s.
                                     H1 : πs = πn.s.

   where πs denotes the population proportion of smokers with villi atrophy and πn.s.
   denotes the population proportion of non-smokers with villi atrophy. By hand, we
   can verify that

                                  ns = 36
                               nn.s. = 22
                                       21
                                 ps =
                                 ˆ         = 0.583
                                       36
                                        4
                               pn.s. =
                               ˆ           = 0.182
                                       22
                                        21 + 4
                                   p =
                                   ˆ           = 0.431
                                       36 + 22
   Before carrying out this test, it is important to determine whether or not the re-
   quirements/assumptions of the test are met. Without detailed information about
   the sampling mechanism, we will assume that the smokers and non-smokers in the
   study were sampled independently of each other, so the sample of smokers is inde-
   pendent of the sample of non-smokers, and individuals within each sample are in-
   dependent. Additionally, we must confirm that ns p > 5, ns (1 − p) > 5, nn.s. p > 5,
                                                      ˆ            ˆ            ˆ


                                           6
and nn.s. (1 − p) > 5. Since
               ˆ

                           ns p = 36 · 0.431 = 15.517 > 5
                              ˆ
                    ns (1 − p) = 36 (1 − 0.431) = 20.483 > 5
                            ˆ
                        nn.s. p = 22 · 0.431 = 9.483 > 5
                              ˆ
                  nn.s. (1 − p) = 22 (1 − 0.431) = 12.517 > 5
                             ˆ

it is appropriate to use a normal approximation for our two sample test for inde-
pendent proportions. Then our test statistic is given by
                                       ps − pn.s.
                                       ˆ    ˆ
                        z =
                                               1         1
                                  p (1 − p)
                                  ˆ      ˆ    ns
                                                   +   nn.s.

                                        0.583 − 0.182
                           =
                                                         1         1
                                  0.431 (1 − 0.431)      36
                                                               +   22
                           = 2.996

and the corresponding p-value is given by

          P (Z > 2.996) + P (Z < −2.996) = 2 · 0.001367431 = 0.0027

Since our p-value of 0.0027 is less than any reasonable significance level, we have
sufficient evidence to reject the null hypothesis, meaning that we have evidence
to suggestion that the proportion of smokers with villi atrophy is different from
the proportion of non-smokers with villi atrophy. This means that villi atrophy is
associated with smoking status.
The second test we can consider is a chi-square test of independence. If villi atrophy
is not associated with smoking status, then villi atrophy and smoking status are
independent. Otherwise, they are dependent. To carry out this test, we consider
the hypotheses

           H0 :    smoking status and villi atrophy are independent
           H1 :    smoking status and villi atrophy are not independent

Output for the chi-square test of independence is given in Figure 6. The test statistic
is

                     χ2 = 8.976 , d.f. = (2 − 1) · (2 − 1) = 1

and the corresponding p-value is approximately 0.003. Note that these results
are identical to the two sample test for independent proportions. In particular,


                                       7
Figure 6: Output for chi-square test of independence.

z 2 = 2.9962 = 8.976 = χ2 , and the p-values are identical (The difference between
the two is simply due to rounding for the p-value for the chi-square test of inde-
pendence). As we would expect, given the results of the two sample test for inde-
pendent proportions, we reject the null hypothesis, so we have evidence to suggest
that smoking status and villi atrophy are dependent, so incidence of villi atrophy
changes based on smoking status.
Other than the previously mentioned independence assumption, the only other as-
sumption for the chi-square test is that no more than 20% of expected cell counts
have a value less than 5. From the output, we see that no cell counts are less
than 5, so using the chi-square test of independence is appropriate. Note that the
output also gives the lowest expected cell count. In fact, this quantity will always
correspond to the smallest value of n1 p, n1 (1 − p), n2 p, and n2 (1 − p). Thus, if
                                         ˆ          ˆ     ˆ               ˆ
this value is greater than 5, we know that the sample size requirements are met
to use the normal approximation for a two sample test for independent propor-
tions. At the same time, n1 p, n1 (1 − p), n2 p, and n2 (1 − p) give the expected cell
                             ˆ         ˆ      ˆ              ˆ
counts for a chi-square test. These further highlight that these two tests are in fact
identical except that they use different probability distributions (standard normal
distribution versus chi-square distribution). Consequently, SPSS does not include
a function for two sample tests for independent samples because a chi-square test
of independence does exactly the same thing.
To display our data, we note that our data are categorical, so a good way to graph
the data is through bar plots. In Figure 7, we give two similar barplots of smoking
status by villi atrophy. The plot on the top gives a comparison of smokers with
non-smokers, whereas the bottom gives a comparison across the different smoking
classes. These suggest that the likelihood of villi atrophy increases with smoking.




                                       8
Figure 7: Bar plots of smoking status villi atrophy.




                         9
3. An environmental scientist studying the impact of pollution on species diversity
   along two nearby rivers carried out a survey in which plots (quadrats) of size 30
   metres by 20 metres were randomly chosen from along the banks of the rivers.
   Within each quadrat the numbers of different tree species were recorded. The data
   were as follows:

                           Valley River           Ridge River
                         9 9 15 12 13          13 10 6 7 10
                        13 13 8 11 9            9 18 6 9 9
                        10 9 14                11 7 8 6 11

   What would you conclude from these data in terms of differences in species diver-
   sity? Think about the nature of the data, what might be the best way to compare
   them, what assumptions are being made in the comparison, etc. Are there any
   values which might need special consideration? What is their effect on the various
   analyses if included or excluded?


   Examining boxplots of species for each river, given by Figure 8, the value of 18
   species recorded for the Ridge River is an outlier. Without a valid reason to omit
   this data point (e.g., the value was clearly recorded wrong), it should remain in the
   analysis. However, for instructional purposes we will carry out analyses including
   and excluding this data point to see how much of an impact it has on the analyses.


   We would like to test the hypotheses

                                     H0 : µV   = µR
                                     H1 : µV   = µR

   where µV denotes the average number of species for the Valley River, and µR denotes
   the average number of species for the Ridge River. We might consider carrying out
   a t-test for two independent means. To carry out such a test, we must assume that
   the two rivers are independent of each other and that sampled quadrats within each
   river are independent of each other. Since the number of quadrats is not all that
   large for either river, we must see if it is plausible that the number of species per
   quadrat comes from an approximately normal distribution for each river. To do this,
   we look at Q-Q plots for species numbers separately for the two rivers. Such plots
   are given in Figure 9. We note that the points fall quite close to the line with the
   lone exception being the outlier, so it seems plausible that the normality assumption
   has been met. The third assumption that must be checked is that of equal variance


                                          10
Figure 8: Boxplots of species per quadrat for the .

for number of species per quadrat for the two rivers. SPSS automatically carries out
Levene’s test with a t-test for two independent means, and output from Levene’s
test and the t-test is given in Figure 10. Levene’s test essentially tests

                                        2      2
                                  H0 : σV   = σR
                                        2      2
                                  H1 : σV   = σR

         2
where σV denotes the population variance for number of species per quadrat for
                         2
the Valley River, and σR denotes the population variance for number of species
per quadrat for the Ridge River. This test produces a p-value of 0.696 which is
larger than any reasonable significance level, so we do not reject the null hypothesis,
meaning that we have insufficient evidence to conclude that the variances for number
of species per quadrat are different for the two rivers. Thus, it seems plausible that
our assumption of equal variances is met. With the necessary assumptions met, we
proceed with the t-test and note that we obtained a test statistic of t = 1.711 on
26 degrees of freedom. The corresponding p-value of 0.099 is not significant at the
α = 0.05 significance level, so we would not reject H0 , and we cannot conclude that
there is a difference between the two rivers in terms of diversity of the species.
To assess the impact of the outlier on this result, we repeat the analysis with this
outlier omitted. Doing so, our Q-Q plot for the Ridge River is given by Figure 11
and more strongly suggests normality than before. Repeating Levene’s test on this
data, we obtain the output given in Figure 12. The p-value of 0.540 again suggests


                                      11
Figure 9: Q-Q plots for species numbers per quadrat for the Valley River (top) and Ridge
River (bottom).




      Figure 10: Output for Levene’s test and t-test for two independent means.



                                          12
Figure 11: Q-Q plot for species numbers per quadrat for the Ridge River.

     that it is plausible that the assumption of equal variances is met, so the necessary
     requirements of the test are still met, and we repeat our t-test for the original data,
     minus the outlier. The output is given in Figure 12. With the outlier removed, the




Figure 12: Output for Levene’s test and t-test for two independent means with outlier
removed.

     test statistic is given by t = 2.838 with 25 degrees of freedom, and the p-value is
     0.009. This p-value is smaller than most any significance levels we might consider, so
     we reject the null hypothesis, meaning that we have evidence to suggest that there
     is a difference between the two rivers in terms of diversity of the species. Thus, this
     outlier has a rather significant impact on the results of our analysis.
     Non-parametric tests are not as adversely affected by outliers, so we might have
     considered a Wilcoxon rank-sum/Mann-Whitney U test instead of a t-test. This
     test has only the minimal assumption that the two rivers are independent of each
     other and sampled quadrats are independent of each other for each river, and the




                                            13
hypotheses are given by

                                      H0 : µV
                                           ˜    = µR
                                                  ˜
                                      H1 : µV
                                           ˜    = µR
                                                  ˜

     where µV denotes the median number of species per quadrat for the Valley River,
             ˜
     and µR denotes the median number of species per quadrat for the Ridge River.
           ˜
     Output for such a test on the full data set is given in Figure 13. The p-value of
     0.058 means that we would not reject the null hypothesis at the α = 0.05 significance
     level, so we have insufficient evidence to suggest that there is a difference in the
     median number of species per quadrat for the Valley River and the Ridge River.




          Figure 13: Output for Wilcoxon rank-sum/ Mann-Whitney U test.

     If we remove the outlier, we obtain the results given in Figure 14. As we would
     expect, we get a smaller p-value of 0.019 when we remove the outlier, and we
     would reject the null hypothesis at the α = 0.05 significance level. Thus, we would
     conclude that we have sufficient evidence to suggest that there is a difference in the
     median number of species per quadrat for the Valley River and the Ridge River.




Figure 14: Output for Wilcoxon rank-sum/ Mann-Whitney U test with outlier removed.

     Note that, with the outlier included in the analysis, the p-value produced by the
     Wilcoxon rank-sum/Mann-Whitney U test is substantially smaller than that pro-
     duced by the t-test. When we remove the outlier, we observe the opposite relation-
     ship, as the p-value produced by the Wilcoxon rank-sum/Mann-Whitney U test is
     larger than that produced by the t-test. This highlights important aspects of para-
     metric and non-parametric tests. Parametric tests are quite sensitive to departures


                                          14
from the assumed distributions of the samples (in the case of the t-test, normal
distributions), so outliers will have a significant impact on the results and tend to
inflate the p-value. Since non-parametric tests are not tied to specific distributional
assumptions for the samples, outliers have a much less significant impact, so they
tend to inflate the p-value much less when outliers are present. At the same time,
if there are no outliers present and the samples fit the distributional assumptions of
the parametric test, that test will virtually always produce a smaller p-value than
a non-parametric tests, as the minimal distribution assumptions of non-parametric
tests will make their estimates of the p-value more conservative.




                                      15

More Related Content

PPTX
New chm 152 unit 2 power points sp13
PPTX
2012 15 3 and 15 4
PPTX
2016 topic 5.1 measuring energy changes
PDF
Workbook for General Chemistry II
PDF
Magmalogy assignment
DOCX
ΔΡΑΣΤΗΡΙΟΤΗΤΕΣ ΣΤΗΝ ΤΑΞΗ ΜΕ ΘΕΜΑ "ΤΟ ΧΡΟΝΟ"
PDF
Workshop 4
PDF
γεωλογια ευρωπη 1
New chm 152 unit 2 power points sp13
2012 15 3 and 15 4
2016 topic 5.1 measuring energy changes
Workbook for General Chemistry II
Magmalogy assignment
ΔΡΑΣΤΗΡΙΟΤΗΤΕΣ ΣΤΗΝ ΤΑΞΗ ΜΕ ΘΕΜΑ "ΤΟ ΧΡΟΝΟ"
Workshop 4
γεωλογια ευρωπη 1

Viewers also liked (20)

PPTX
A Teacher's Heart
PDF
Bancos de dados distribuídos
PPTX
Data security concepts chapter 2
DOC
levis
DOCX
A tecnologia educação
PPTX
Copyright and Technology
DOCX
Multithreaded tecnologia
DOCX
γράμμα μητέρας για I phone
PPS
Athens beauty
DOC
penduduk indonesia
PPT
ο ήλιος ο ηλιάτορας
PPT
εικόνες χριστουγέννων
PPT
Proyecto poncho
PDF
Matematica aplicadaparalatecnicadelautomovil
PDF
Prueba 1
PPTX
أنماط شخصية
PPTX
Les energies
PDF
Actividades
PDF
Actividades (masa etc)
PPTX
Biografia de-eloy-alfaro
A Teacher's Heart
Bancos de dados distribuídos
Data security concepts chapter 2
levis
A tecnologia educação
Copyright and Technology
Multithreaded tecnologia
γράμμα μητέρας για I phone
Athens beauty
penduduk indonesia
ο ήλιος ο ηλιάτορας
εικόνες χριστουγέννων
Proyecto poncho
Matematica aplicadaparalatecnicadelautomovil
Prueba 1
أنماط شخصية
Les energies
Actividades
Actividades (masa etc)
Biografia de-eloy-alfaro
Ad

Similar to Workshop 4 -solutions (20)

PPT
Independent sample t test
PPTX
Chi square(hospital admin) A
PPTX
slides Testing of hypothesis.pptx
PPTX
Chi square
PPTX
Company Induction process and Onboarding
PPTX
Chi-Square test.pptx
DOCX
Non parametrics tests
PPTX
Hypothesis testing
PPTX
Hypothesis testing
PPTX
Test of significance
PPT
Chapter11
PDF
202003241550010409rajeev_pandey_Non-Parametric.pdf
PDF
P G STAT 531 Lecture 7 t test and Paired t test
PPT
Hypothesis Testing Assignment Help
PPTX
Intro to tests of significance qualitative
DOCX
Andy Lee Pressure Temp Lab
PDF
Hmisiri nonparametrics book
DOCX
Measurement  of  the  angle  θ          .docx
PPTX
Test of-significance : Z test , Chi square test
PPTX
Accelerated stability studies
Independent sample t test
Chi square(hospital admin) A
slides Testing of hypothesis.pptx
Chi square
Company Induction process and Onboarding
Chi-Square test.pptx
Non parametrics tests
Hypothesis testing
Hypothesis testing
Test of significance
Chapter11
202003241550010409rajeev_pandey_Non-Parametric.pdf
P G STAT 531 Lecture 7 t test and Paired t test
Hypothesis Testing Assignment Help
Intro to tests of significance qualitative
Andy Lee Pressure Temp Lab
Hmisiri nonparametrics book
Measurement  of  the  angle  θ          .docx
Test of-significance : Z test , Chi square test
Accelerated stability studies
Ad

Recently uploaded (20)

PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
advance database management system book.pdf
PDF
Trump Administration's workforce development strategy
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
Introduction to Building Materials
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
Empowerment Technology for Senior High School Guide
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
History, Philosophy and sociology of education (1).pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Weekly quiz Compilation Jan -July 25.pdf
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
What if we spent less time fighting change, and more time building what’s rig...
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
advance database management system book.pdf
Trump Administration's workforce development strategy
Paper A Mock Exam 9_ Attempt review.pdf.
Introduction to Building Materials
A systematic review of self-coping strategies used by university students to ...
Indian roads congress 037 - 2012 Flexible pavement
Orientation - ARALprogram of Deped to the Parents.pptx
Unit 4 Skeletal System.ppt.pptxopresentatiom
Empowerment Technology for Senior High School Guide
LDMMIA Reiki Yoga Finals Review Spring Summer
Chinmaya Tiranga quiz Grand Finale.pdf

Workshop 4 -solutions

  • 1. MAS230 : Biostatistical Methods Workshop 4 1. Consider Data Set 75. These data come from Judge, M.D. et al (1984). “Thermal shrinkage temperature of intramuscular collagen of bulls and steers,” Journal of Animal Science 59: 706–9, and are reproduced in Samuels and Witmer (1999), Statistics for Life Sciences, 2nd Edition, Prentice Hall, p. 357. The study is designed to assess the effect of electrical stimulation of a beef carcass in terms of improving the tenderness of the meat. In this test, beef carcasses were split in half. One side was subjected to a brief electrical current while the other was an untreated control. From each side a specimen of connective tissue (collagen) was taken, and the temperature at which shrinkage occurred was determined. Increased tenderness is related to a low shrinkage temperature. Carry out analyses to assess the impact of electrical stimulation on the meat tender- ness. Use both parametric and non-parametric methods and compare the results. Suppose acceptable tenderness corresponds to a shrinkage temperature less than 69 degrees. How would you test to see if the proportions of acceptable tenderness values differed under the two treatments? Use SPSS to create appropriate variables to enable this to be tested and carry out the analysis. Don’t forget that the sample sizes are small here. To assess the impact of electrical stimulation on meat tenderness, we will examine whether the mean temperature at which shrinkage occurs is different for the two groups. Since one half of each carcass is assigned to the untreated group, and the other half is assigned to the group that receives electrical stimulation, the two samples are not independent. Consequently, the most appropriate analysis is a test for paired differences. We might consider a t-test for paired differences. This test is based on the hypothe- ses H0 : δ = 0 H1 : δ = 0 where δ represents the mean difference in temperature at which shrinkage occurs for the untreated carcass halves and the carcass halves treated with electrical stimula- tion. To carry out a t-test for paired differences the carcasses must be independent. 1
  • 2. Without information about how the carcasses were obtained, we will assume that this is the case. Additionally, the differences in temperature at which shrinkage occurs must come from an approximately normal distribution. A Q-Q plot for dif- ferences in temperature at which shrinkage occurs is given in Figure 1. The observed Figure 1: Q-Q plot of differences in temperature at which shrinkage occurs. differences fall quite close to the line, suggesting that the assumption of normality of the differences is plausible, and we proceed with the t-test. Output for this test is given in Figure 2. The test statistic is given by t = 0.404 on 14 degrees of freedom. Figure 2: Output for t-test for mean difference in temperature at which shrinkage occurs for untreated carcass halves and electrically stimulated carcass halves. The corresponding p-value of 0.692 is not significant for any reasonable significance level, so we do not reject the null hypothesis, meaning that we have insufficient evidence to suggest that treating carcasses with electrical stimulation produces any difference in the mean temperature at which shrinkage occurs. If the assumptions of the t-test had not met, we might have considered a sign test or Wilcoxon signed-rank test. Even though the t-test is the appropriate test, 2
  • 3. here I provide analyses using both the sign test and Wilcoxon signed-rank test for illustrative purposes. For both the sign test and Wilcoxon signed-rank test, we would test the hypotheses H0 : µ = 0 ˜ H1 : µ = 0 ˜ where µ represents the median difference in temperature at which shrinkage oc- ˜ curs for the untreated carcass halves and the carcass halves treated with elec- trical stimulation. To carry out both tests, the carcasses must be independent. We already addressed this requirement/assumption for the t-test. To carry out a Wilcoxon signed-rank test, the differences would need to be approximately sym- metric. Both a histogram and boxplot for differences are given in Figure 3. Neither show strong evidence to suggest that differences are asymmetric, so either a sign test or a Wilcoxon-signed rank test can be carried out. Output for the two tests are given in Figure 4. The p-values for both tests (0.791 for the sign test and 0.801 for the Wilcoxon signed-rank test) exceed any reasonable significance level, so we would not reject the null hypothesis, meaning that we have insufficient evidence to conclude that there is a difference in the median temperature at which shrinkage occurs for untreated carcass halves and electrically stimulated carcass halves. For the secondary analysis of interest, we must convert temperatures at which shrinkage occurs to binary variables, where a ‘1’ denotes a shrinkage temperature less than 69 degrees, and a ‘0’ denotes a shrinkage temperature of 69 degrees or more. (Note that we could also let ‘0’ denote a shrinkage temperature less than 69 degrees and ‘1’ denote a shrinkage temperature of 69 degrees or more, and our analysis would produce identical results.) Our two samples are still dependent, and we would like to determine whether or not the proportions of acceptable carcass halves differs under the two treatments, so we are restricted to either a kappa test or McNemar’s test. McNemar’s test is the appropriate test here, as we would like to see whether or not the proportion of acceptable carcass halves changes after apply- ing a treatment (electrical stimulation). Since the independence assumption for this test was already addressed when we carried out the t-test, we proceed with the test. Output for the test is given in Figure 5. The p-value of approximately 1 means that we will not reject the null hypothesis for any reasonable significance level, meaning that we do not have evidence to suggest that there is a difference in the proportion of acceptable carcass halves for untreated carcass halves and electrically stimulated carcass halves. 3
  • 4. Figure 3: Histogram (top) and boxplot (bottom) for difference in temperature at which shrinkage occurs for the untreated carcass halves and the carcass halves treated with electrical stimulation. 4
  • 5. Figure 4: Output for sign test (left) and Wilcoxon signed-rank test (right) for median temperature at which shrinkage occurs for untreated carcass halves and electrically stim- ulated carcass halves. Figure 5: Output for McNemar’s test for difference in proportion of acceptable carcass halves for untreated carcass halves and electrically stimulated carcass halves. 5
  • 6. 2. Refer to Data Set 76. These data come from Mochizuki, M. et al (1984). “Effects of smoking on fetoplacentalmaternal system during pregnancy,” American J. Ob- stet. Gyn. 149: 13–20. The study considered the effects of smoking during preg- nancy by examining the placentas from 58 women after childbirth. Each mother was classified as a non-, moderate or heavy smoker during pregnancy, and the outcome measure was presence or absence of atrophied placental villi, finger-like structures that protrude from the wall to increase absorption area. Combine the two smoking classes to create a “smoker” class and carry out an ap- propriate test for association of villi atrophy with smoking status. Given there are three ordered classes of smoking (none < moderate < heavy) think about how you might display such data. We can consider either of two different statistical tests, both of which will produce identical results. First, if villi atrophy is associated with smoking status, that means that the probability of villi atrophy changes with a person’s smoking status. Thus, we can carry out a test to determine if the proportion of smokers with villi atrophy is the same as the proportion of non-smokers with villi atrophy. To do this, we consider the statistical hypotheses H0 : πs = πn.s. H1 : πs = πn.s. where πs denotes the population proportion of smokers with villi atrophy and πn.s. denotes the population proportion of non-smokers with villi atrophy. By hand, we can verify that ns = 36 nn.s. = 22 21 ps = ˆ = 0.583 36 4 pn.s. = ˆ = 0.182 22 21 + 4 p = ˆ = 0.431 36 + 22 Before carrying out this test, it is important to determine whether or not the re- quirements/assumptions of the test are met. Without detailed information about the sampling mechanism, we will assume that the smokers and non-smokers in the study were sampled independently of each other, so the sample of smokers is inde- pendent of the sample of non-smokers, and individuals within each sample are in- dependent. Additionally, we must confirm that ns p > 5, ns (1 − p) > 5, nn.s. p > 5, ˆ ˆ ˆ 6
  • 7. and nn.s. (1 − p) > 5. Since ˆ ns p = 36 · 0.431 = 15.517 > 5 ˆ ns (1 − p) = 36 (1 − 0.431) = 20.483 > 5 ˆ nn.s. p = 22 · 0.431 = 9.483 > 5 ˆ nn.s. (1 − p) = 22 (1 − 0.431) = 12.517 > 5 ˆ it is appropriate to use a normal approximation for our two sample test for inde- pendent proportions. Then our test statistic is given by ps − pn.s. ˆ ˆ z = 1 1 p (1 − p) ˆ ˆ ns + nn.s. 0.583 − 0.182 = 1 1 0.431 (1 − 0.431) 36 + 22 = 2.996 and the corresponding p-value is given by P (Z > 2.996) + P (Z < −2.996) = 2 · 0.001367431 = 0.0027 Since our p-value of 0.0027 is less than any reasonable significance level, we have sufficient evidence to reject the null hypothesis, meaning that we have evidence to suggestion that the proportion of smokers with villi atrophy is different from the proportion of non-smokers with villi atrophy. This means that villi atrophy is associated with smoking status. The second test we can consider is a chi-square test of independence. If villi atrophy is not associated with smoking status, then villi atrophy and smoking status are independent. Otherwise, they are dependent. To carry out this test, we consider the hypotheses H0 : smoking status and villi atrophy are independent H1 : smoking status and villi atrophy are not independent Output for the chi-square test of independence is given in Figure 6. The test statistic is χ2 = 8.976 , d.f. = (2 − 1) · (2 − 1) = 1 and the corresponding p-value is approximately 0.003. Note that these results are identical to the two sample test for independent proportions. In particular, 7
  • 8. Figure 6: Output for chi-square test of independence. z 2 = 2.9962 = 8.976 = χ2 , and the p-values are identical (The difference between the two is simply due to rounding for the p-value for the chi-square test of inde- pendence). As we would expect, given the results of the two sample test for inde- pendent proportions, we reject the null hypothesis, so we have evidence to suggest that smoking status and villi atrophy are dependent, so incidence of villi atrophy changes based on smoking status. Other than the previously mentioned independence assumption, the only other as- sumption for the chi-square test is that no more than 20% of expected cell counts have a value less than 5. From the output, we see that no cell counts are less than 5, so using the chi-square test of independence is appropriate. Note that the output also gives the lowest expected cell count. In fact, this quantity will always correspond to the smallest value of n1 p, n1 (1 − p), n2 p, and n2 (1 − p). Thus, if ˆ ˆ ˆ ˆ this value is greater than 5, we know that the sample size requirements are met to use the normal approximation for a two sample test for independent propor- tions. At the same time, n1 p, n1 (1 − p), n2 p, and n2 (1 − p) give the expected cell ˆ ˆ ˆ ˆ counts for a chi-square test. These further highlight that these two tests are in fact identical except that they use different probability distributions (standard normal distribution versus chi-square distribution). Consequently, SPSS does not include a function for two sample tests for independent samples because a chi-square test of independence does exactly the same thing. To display our data, we note that our data are categorical, so a good way to graph the data is through bar plots. In Figure 7, we give two similar barplots of smoking status by villi atrophy. The plot on the top gives a comparison of smokers with non-smokers, whereas the bottom gives a comparison across the different smoking classes. These suggest that the likelihood of villi atrophy increases with smoking. 8
  • 9. Figure 7: Bar plots of smoking status villi atrophy. 9
  • 10. 3. An environmental scientist studying the impact of pollution on species diversity along two nearby rivers carried out a survey in which plots (quadrats) of size 30 metres by 20 metres were randomly chosen from along the banks of the rivers. Within each quadrat the numbers of different tree species were recorded. The data were as follows: Valley River Ridge River 9 9 15 12 13 13 10 6 7 10 13 13 8 11 9 9 18 6 9 9 10 9 14 11 7 8 6 11 What would you conclude from these data in terms of differences in species diver- sity? Think about the nature of the data, what might be the best way to compare them, what assumptions are being made in the comparison, etc. Are there any values which might need special consideration? What is their effect on the various analyses if included or excluded? Examining boxplots of species for each river, given by Figure 8, the value of 18 species recorded for the Ridge River is an outlier. Without a valid reason to omit this data point (e.g., the value was clearly recorded wrong), it should remain in the analysis. However, for instructional purposes we will carry out analyses including and excluding this data point to see how much of an impact it has on the analyses. We would like to test the hypotheses H0 : µV = µR H1 : µV = µR where µV denotes the average number of species for the Valley River, and µR denotes the average number of species for the Ridge River. We might consider carrying out a t-test for two independent means. To carry out such a test, we must assume that the two rivers are independent of each other and that sampled quadrats within each river are independent of each other. Since the number of quadrats is not all that large for either river, we must see if it is plausible that the number of species per quadrat comes from an approximately normal distribution for each river. To do this, we look at Q-Q plots for species numbers separately for the two rivers. Such plots are given in Figure 9. We note that the points fall quite close to the line with the lone exception being the outlier, so it seems plausible that the normality assumption has been met. The third assumption that must be checked is that of equal variance 10
  • 11. Figure 8: Boxplots of species per quadrat for the . for number of species per quadrat for the two rivers. SPSS automatically carries out Levene’s test with a t-test for two independent means, and output from Levene’s test and the t-test is given in Figure 10. Levene’s test essentially tests 2 2 H0 : σV = σR 2 2 H1 : σV = σR 2 where σV denotes the population variance for number of species per quadrat for 2 the Valley River, and σR denotes the population variance for number of species per quadrat for the Ridge River. This test produces a p-value of 0.696 which is larger than any reasonable significance level, so we do not reject the null hypothesis, meaning that we have insufficient evidence to conclude that the variances for number of species per quadrat are different for the two rivers. Thus, it seems plausible that our assumption of equal variances is met. With the necessary assumptions met, we proceed with the t-test and note that we obtained a test statistic of t = 1.711 on 26 degrees of freedom. The corresponding p-value of 0.099 is not significant at the α = 0.05 significance level, so we would not reject H0 , and we cannot conclude that there is a difference between the two rivers in terms of diversity of the species. To assess the impact of the outlier on this result, we repeat the analysis with this outlier omitted. Doing so, our Q-Q plot for the Ridge River is given by Figure 11 and more strongly suggests normality than before. Repeating Levene’s test on this data, we obtain the output given in Figure 12. The p-value of 0.540 again suggests 11
  • 12. Figure 9: Q-Q plots for species numbers per quadrat for the Valley River (top) and Ridge River (bottom). Figure 10: Output for Levene’s test and t-test for two independent means. 12
  • 13. Figure 11: Q-Q plot for species numbers per quadrat for the Ridge River. that it is plausible that the assumption of equal variances is met, so the necessary requirements of the test are still met, and we repeat our t-test for the original data, minus the outlier. The output is given in Figure 12. With the outlier removed, the Figure 12: Output for Levene’s test and t-test for two independent means with outlier removed. test statistic is given by t = 2.838 with 25 degrees of freedom, and the p-value is 0.009. This p-value is smaller than most any significance levels we might consider, so we reject the null hypothesis, meaning that we have evidence to suggest that there is a difference between the two rivers in terms of diversity of the species. Thus, this outlier has a rather significant impact on the results of our analysis. Non-parametric tests are not as adversely affected by outliers, so we might have considered a Wilcoxon rank-sum/Mann-Whitney U test instead of a t-test. This test has only the minimal assumption that the two rivers are independent of each other and sampled quadrats are independent of each other for each river, and the 13
  • 14. hypotheses are given by H0 : µV ˜ = µR ˜ H1 : µV ˜ = µR ˜ where µV denotes the median number of species per quadrat for the Valley River, ˜ and µR denotes the median number of species per quadrat for the Ridge River. ˜ Output for such a test on the full data set is given in Figure 13. The p-value of 0.058 means that we would not reject the null hypothesis at the α = 0.05 significance level, so we have insufficient evidence to suggest that there is a difference in the median number of species per quadrat for the Valley River and the Ridge River. Figure 13: Output for Wilcoxon rank-sum/ Mann-Whitney U test. If we remove the outlier, we obtain the results given in Figure 14. As we would expect, we get a smaller p-value of 0.019 when we remove the outlier, and we would reject the null hypothesis at the α = 0.05 significance level. Thus, we would conclude that we have sufficient evidence to suggest that there is a difference in the median number of species per quadrat for the Valley River and the Ridge River. Figure 14: Output for Wilcoxon rank-sum/ Mann-Whitney U test with outlier removed. Note that, with the outlier included in the analysis, the p-value produced by the Wilcoxon rank-sum/Mann-Whitney U test is substantially smaller than that pro- duced by the t-test. When we remove the outlier, we observe the opposite relation- ship, as the p-value produced by the Wilcoxon rank-sum/Mann-Whitney U test is larger than that produced by the t-test. This highlights important aspects of para- metric and non-parametric tests. Parametric tests are quite sensitive to departures 14
  • 15. from the assumed distributions of the samples (in the case of the t-test, normal distributions), so outliers will have a significant impact on the results and tend to inflate the p-value. Since non-parametric tests are not tied to specific distributional assumptions for the samples, outliers have a much less significant impact, so they tend to inflate the p-value much less when outliers are present. At the same time, if there are no outliers present and the samples fit the distributional assumptions of the parametric test, that test will virtually always produce a smaller p-value than a non-parametric tests, as the minimal distribution assumptions of non-parametric tests will make their estimates of the p-value more conservative. 15