SlideShare a Scribd company logo
The Tryptone Task
Group 7
Yuwu Chen
Alfonso R Croeze
Introduction
 Staphylococcus aureus is a bacterium, commonly found on skin and in the
respiratory tract, that can cause ailments such as skin infections and respiratory
diseases.
 Like other bacteria, Staphylococcus aureus can be grown in medical laboratories
to aid in identifying and treating skin conditions.
 Poor growth rates of Methicillin resistant Staphylococcus aureus (MRSA) in one
laboratory prompted the investigators to experiment with different culturing
conditions.
 Five strains of MRSA were examined in this experiment. Due to their complex
names, they are referred to as 1, 2, 3, 4, and 5 in the data.
Data Description
 The tryptone dataset contains bacteria counts after the culturing of five strains
of Staphylococcus aureus.
 The data was collected by Gavin Cooper at the Auckland University of
Technology, New Zealand. The full dataset:
http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt
 No missing values.
 Tests on (a) factorial models with interactions to identify significant factors,
(b) optimal conditions estimated by partial differentiation.
Data Description
 Treatments:
 Time - In hours: 24 and 48
 Temperature - Temperature of incubation in degrees Celcius: 27, 35, 43
 Concentration - The concentration of the nutrient tryptone as a
percentage: 0.6, 0.8, 1.0, 1.2, 1.4
 Block:
 Count column - Five count columns: 1, 2, 3, 4, 5
 Redundant variable:
 Row - this is the case number
 Response (dependent) variable:
 Strain counts - Bacteria counts: 3 to 284
Data Management
 Data transformation
The original dataset shows aspects of both multivariate data, where the count
column variable is arranged in columns, and univariate data, where the levels of
the time, temperature and concentration variables respectively are listed in three
columns.
Row Count1 Count2 Count3 Count4 Count5 Time Temp Conc
1 9 3 10 14 33 24 27 0.6
2 16 12 26 20 31 24 27 0.8
Strain counts, which are analyzed in a univariate procedure, are recorded in
different count columns: they must be placed in a single column. The count
column variable should be in its own single column as well.
Data was transformed by SAS code:
Input row count1 count2 count3 count4 count5 time temp conc;
column = 1; count = count1; output strain;
column = 2; count = count2; output strain;
The new dataset:
The new dataset strain and the complete SAS code are in the output files.
Obs time temp conc column count
1 24 27 0.6 1 9
2 24 27 0.6 2 3
3 24 27 0.6 3 10
Data Management
 Balance check:
When fixing the treatment “time”, the tables below demonstrate that all 12
combinations of the other two treatments exist, and that the frequency of
replicates in each combination is the same.
Similarly, when fixing variable concentration or temperature, the frequency
tables show that the experiment is balanced. (These results are shown in the
output files.)
 α = 0.05 is used for the entire analysis.
Table 1 of temp by conc
Controlling for time=24
temp conc
Frequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75
Table 2 of temp by conc
Controlling for time=48
temp conc
Frequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75
Data Summary
 Differences in means? Symmetric data? Homogeneous variances?
Figures below (left to right): distribution of count by time, temperature and
concentration.
 First impressions from the box plots:
 In each treatment, means at different levels are quite different.
 In temperature treatments, the data is less symmetric, so possibly not normal.
The other two treatments looks more symmetric.
 In each treatment, the variances may not be equal to each other.
Method Description
 Step 1: Test on factorial models with interactions to identify significant factors.
 ANOVA test on factorial RBD, full model:
The variances are separated.
 ANOVA test on factorial RBD, reduced model:
Homogeneous variance is assumed and the variance is pooled.
 Step 2: Test for optimal conditions estimated by partial differentiation.
 Multiple polynomial regression
 The current protocols for culturing this bacteria have the time at 24 hours, the
temperature at 35 degrees Celsius and the tryptone concentration at 1.0%.
Step 1: Test on factorial models with
interactions to identify significant factors
 Full model vs. reduced model: which one is better?
Fit Statistics
-2 Res Log Likelihood 1107.3
AIC (Smaller is Better) 1169.3
AICC (Smaller is Better) 1191.9
BIC (Smaller is Better) 1157.2
Full model: Reduced
model:
Fit Statistics
-2 Res Log Likelihood 1148.5
AIC (Smaller is Better) 1152.5
AICC (Smaller is Better) 1152.6
BIC (Smaller is Better) 1151.7
 The reduced model has the smaller AIC value, which indicates that it is the better
model.
 The sources of variation and degrees of freedom:
 Assumptions: Independence, normal distribution of residuals, homogeneity of
variances
Source degrees of freedom d.f.
Tmt1 (Time) t1-1 1
Tmt2 (Temperature) t2-1 2
Tmt3 (Concentration) t3-1 4
Block (Count column) b-1 4
Interaction1 (Tmt1 * Tmt2) (t1-1)(t2-1) 2
Interaction2 (Tmt1 * Tmt3) (t1-1)(t3-1) 4
Interaction3 (Tmt1 * Tmt2) (t2-1)(t3-1) 8
Interaction4 (Tmt1 * Tmt2 * Tmt3) (t1-1)(t2-1)(t3-1) 8
Experimental Error (b-1)[(t1-1) + (t2-1) + (t3-1) (t1-1)(t2-1) + (t1-1)(t2-1)
+ (t1-1)(t2-1) + (t1-1)(t2-1)(t3-1)]
116
Total bt1t2t3-1 149
Block interactions are pooled into a single error term because of the assumption of no block interaction in RBD
ANOVA Test on factorial RBD, reduced
model
 Yes, as p-values of all three treatments are <0.05, we reject H0: μ1 = μ2=…= μt in
each treatment.
 According to the factorial RBD (reduced) model, do different levels in each
treatment have significantly different effects on strain counts?
Type 3 Tests of Fixed Effects
Effect
Num
DF
Den
DF F Value Pr > F
time 1 116 444.27 <.0001
temp 2 116 80.12 <.0001
conc 4 116 64.86 <.0001
 Is there interaction between treatments?
Type 3 Tests of Fixed Effects
Effect
Num
DF
Den
DF F Value Pr > F
time*temp 2 116 38.07 <.0001
time*conc 4 116 3.99 0.0046
temp*conc 8 116 0.85 0.5613
time*temp*conc 8 116 2.17 0.0343
 The hypothesis of no significant interaction effect between time & temp was rejected.
 The hypothesis of no significant interaction effect between time & conc was rejected.
 The hypothesis of no significant interaction effect between temp & conc was NOT
rejected.
 The hypothesis of no significant interaction effect between three treatments was
ANOVA Test on factorial RBD, reduced
model
 Saxton’s Macro was applied to do a range test with the LSMeans output. e.g.:
 Least Squares Means table gives the least squares estimate, the standard error of
the estimate, etc.:
 Which pairs of means in the one treatment are different, at a certain condition of
other treatment levels?
 Pairwise comparisons with TUKEY adjustments are shown in the “Differences
of Least Squares Means” table.
Least Squares Means
Effect time temp conc Estimate
Standard
Error DF t Value Pr > |t| Alpha Lower Upper
time 24 82.2800 3.4399 116 23.92 <.0001 0.05 75.4668 89.0932
time 48 162.75 3.4399 116 47.31 <.0001 0.05 155.93 169.56
temp 27 91.1200 3.9340 116 23.16 <.0001 0.05 83.3281 98.9119
Obs time temp conc Estimate
Standard
Error Alpha Lower Upper
Letter
Group
1 48 _ _ 162.75 3.4399 0.05 155.93 169.56 A
2 24 _ _ 82.2800 3.4399 0.05 75.4668 89.0932 B
Effect=time Method=Tukey-Kramer(P<0.05) Set=1
The complete tables mentioned above are available in the output file.
ANOVA Test on factorial RBD, reduced
model
 Last part of the ANOVA is testing the hypothesis of normality:
 P-value >0.05, so we fail to reject the hypothesis of normality in the residual
distribution.
 Contrasts to test linear/curved trend
 Temperature and concentration treatments are quantitative and equally spaced,
having 3 levels and 5 levels respectively. (Time has only 2 levels)
 The results of the contrasts indicate that both linear and curved models can fit the
data.
Contrasts
Label
Num
DF
Den
DF F Value Pr > F
linear 1 116 57.32 <.0001
quadratic 1 116 102.93 <.0001
linear 1 116 189.36 <.0001
quadratic 1 116 19.69 <.0001
cubic 1 116 32.80 <.0001
quartic 1 116 17.59 <.0001
First two rows are test results for the treatment Temp.
Last four rows are test results for the treatment Conc.
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.988251 Pr < W 0.2392
Kolmogorov-Smirnov D 0.050081 Pr > D >0.1500
Cramer-von Mises W-Sq 0.040777 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.333665 Pr > A-Sq >0.2500
Step 2: Test for optimal conditions estimated
by partial differentiation
 Multiple polynomial regression
 Three simple polynomial regressions are done separately, each treatment with
one polynomial regression.
 Sequentially adjusted Type I SS were used to determine whether the
polynomial model is as good as the one with a higher order term.
 Regression model:
Y = β0 + β1 Xi + β2 X2i +…+ βk Xki + ei
 Based on the regression model, partial differentiation is used to determine the
optimal conditions. (Not displayed in this presentation.)
 Also, the fit plots are useful in finding the maxima.
 Assumptions: Independence, normal distribution of residuals, homogeneity of
variances
Polynomial regression with “Time”
 Is the linear effect significant?
 Fit plot (count vs time)
 Time has only 2 levels, fit with a linear model.
Source DF Type I SS Mean Square F Value Pr > F
time 1 242808.1667 242808.1667 99.48 <.0001
 Yes: p-value for linear <0.05, reject H0: β1 = 0.
Polynomial regression with “Time”
 Polynomial regression model
 Normality test: p-value <0.05, reject the hypothesis of normality
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.97416 Pr < W 0.0063
Kolmogorov-Smirnov D 0.064387 Pr > D 0.1302
Cramer-von Mises W-Sq 0.15658 Pr > W-Sq 0.0204
Anderson-Darling A-Sq 1.023263 Pr > A-Sq 0.0106
Parameter Estimate
Standard
Error t Value Pr > |t|
Intercept 1.813333333 12.75597911 0.14 0.8872
time 3.352777778 0.33614956 9.97 <.0001
Count = 1.813 + 3.352*Time
According to the regression model, the strain count increases with the time increase:
48 hours might get a higher strain count than 24 hours. The current protocol for
culturing this bacteria has the time at 24 hours, so the statistical results do NOT
support this protocol.
Polynomial regression with
“Temperature”
 Is the quadratic effect significant?
 Fit plot (count vs. temperature)
 Temperature has 3 levels, so it is fit with a quadratic model.
 Yes: p-value for quadratic <0.05, reject H0: β2 = 0.
Source DF Type I SS Mean Square F Value Pr > F
temp 1 31329.00000 31329.00000 8.92 0.0033
temp*temp 1 56252.21333 56252.21333 16.01 <.0001
Polynomial regression with
“Temperature”
 Polynomial regression model
 Normality test: p-value <0.05, so we reject the hypothesis of normality
Count = -713.834 + 47.144*Temp – 0.642*Temp2
According to the regression model, the strain count has a maximum at Temp = 35
degrees. The current protocol for culturing this bacteria has the temperature at 35
degrees, so the results support this protocol.
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.966924 Pr < W 0.0011
Kolmogorov-Smirnov D 0.067926 Pr > D 0.0888
Cramer-von Mises W-Sq 0.182315 Pr > W-Sq 0.0089
Anderson-Darling A-Sq 1.229754 Pr > A-Sq <0.0050
Parameter Estimate
Standard
Error t Value Pr > |t|
Intercept -713.8343750 191.4866910 -3.73 0.0003
temp 47.1437500 11.2532848 4.19 <.0001
temp*temp -0.6418750 0.1604124 -4.00 <.0001
Polynomial regression with
“Concentration”
 Is the quartic effect significant?
 Temperature has 5 levels, so we fit it with a quartic model.
 No: p-value for quartic >0.05, do not reject H0: β4 = 0.
Source DF Type I SS Mean Square F Value Pr > F
conc 1 103490.6133 103490.6133 32.46 <.0001
conc*conc 1 10761.6095 10761.6095 3.38 0.0682
conc*conc*conc 1 17925.8700 17925.8700 5.62 0.0190
conc*conc*conc*conc 1 9612.8805 9612.8805 3.02 0.0846
Polynomial regression with
“Concentration”
 Is the cubic effect significant?
 Fit plot (count vs. concentration)
 Now fit it with a cubic model.
 Yes: p-value for quartic <0.05, reject H0: β3 = 0.
Source DF Type I SS Mean Square F Value Pr > F
conc 1 103490.6133 103490.6133 32.02 <.0001
conc*conc 1 10761.6095 10761.6095 3.33 0.0701
conc*conc*conc 1 17925.8700 17925.8700 5.55 0.0198
Polynomial regression with
“Concentration”
 Polynomial regression model
 Normality test: p-value <0.05, reject the hypothesis of normality
Count = 608.923 – 1960.155*Conc + 2289.077*Conc2 – 805.208*Conc3
According to the regression model, the strain count has a maximum at Conc = 1.2%.
The current protocol for culturing this bacteria has the concentration at 1.0%, so the
results do NOT support this protocol.
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.978016 Pr < W 0.0166
Kolmogorov-Smirnov D 0.069177 Pr > D 0.0787
Cramer-von Mises W-Sq 0.161641 Pr > W-Sq 0.0179
Anderson-Darling A-Sq 1.095717 Pr > A-Sq 0.0073
Parameter Estimate
Standard
Error t Value Pr > |t|
Intercept 608.922857 302.692620 2.01 0.0461
conc -1960.154762 989.107532 -1.98 0.0494
conc*conc 2289.077381 1028.037019 2.23 0.0275
conc*conc*conc -805.208333 341.898415 -2.36 0.0198
Conclusion
 Polynomial regression models support the temperature in the current protocol for
culturing Staphylococcus aureus. However, the models do not support the time and
concentration in the protocol.
 An ANOVA test on the factorial RBD was done, and the reduced model is better.
Different levels in each treatment have significantly different effects on strain counts.
There is a significant interaction effect between temperature & concentration. Other pair-
wise comparisons can be found in the output.
 The polynomial regression models did not meet the assumption of normality
according to the Shapiro-Wilk criteria (although they do according to the
Kolmogorov-Smirnov criteria). This might make the data analysis less reliable.
Reference
“Using EDA, ANOVA and Regression to Optimize some Microbiology Data.”
Journal of Statistics Education, Volume 12, Number 2 (July 2004)
http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html

More Related Content

PPTX
Contingency Tables
DOC
Design of experiments(
PPT
ANOVA Concept
PPT
Chapter12
PPTX
PPTX
Two Means, Independent Samples
PPT
Nonparametric statistics
PDF
Randomized complete block_design_rcbd_
Contingency Tables
Design of experiments(
ANOVA Concept
Chapter12
Two Means, Independent Samples
Nonparametric statistics
Randomized complete block_design_rcbd_

What's hot (20)

PPT
The chi square_test
PPTX
Basics of Hypothesis Testing
PDF
InnerSoft STATS - Methods and formulas help
PPTX
What is chi square test
PPTX
Completely randomized design
PPT
Chi square[1]
PPTX
Wilcoxon Rank-Sum Test
PPT
Statistics chm 235
PPT
Chi-square, Yates, Fisher & McNemar
PPT
Chapter11
PDF
Workshop 4
PDF
D-Pubb-TestingExperience_Issue_28_2014-12_Berta
PPTX
Chi square Test
PPTX
Statistics-Non parametric test
PPT
Chi square2012
PPT
ANOVA & EXPERIMENTAL DESIGNS
PPTX
Hypothesis testing for parametric data (1)
PPT
Test of significance (t-test, proportion test, chi-square test)
PPT
Statistics
PPTX
comparison of CRD, RBD and LSD
The chi square_test
Basics of Hypothesis Testing
InnerSoft STATS - Methods and formulas help
What is chi square test
Completely randomized design
Chi square[1]
Wilcoxon Rank-Sum Test
Statistics chm 235
Chi-square, Yates, Fisher & McNemar
Chapter11
Workshop 4
D-Pubb-TestingExperience_Issue_28_2014-12_Berta
Chi square Test
Statistics-Non parametric test
Chi square2012
ANOVA & EXPERIMENTAL DESIGNS
Hypothesis testing for parametric data (1)
Test of significance (t-test, proportion test, chi-square test)
Statistics
comparison of CRD, RBD and LSD
Ad

Similar to Tryptone task (20)

PPTX
Tests of Significance.pptx powerpoint presentation
PPTX
Lecture 3 about it governanace and how it works
PDF
Repeated Measure of three groups in ANOVA.pdf
PPTX
Medical Statistics Pt 2
PDF
Application of Multivariate Regression Analysis and Analysis of Variance
PPTX
Clinicaldataanalysis in r
PDF
Foundations of Statistics for Ecology and Evolution. 5. Linear Models
PPT
6Tests of significance Parametric and Non Parametric tests.ppt
DOCX
[Q1~12]Aclothingstoreisconsideringtwomethodstoreducetheselosses1).docx
PDF
20231 MCHA022 (Analytical Chemistry 2).pdf
PDF
PPTX
Epidemiological study design and it's significance
PPTX
Statistical methods for research scholars (cd)
PPTX
non parametric tests.pptx
PPTX
Test of significance
PDF
A First Course in experimental design and analysis.pdf
PPTX
Non parametric tests
PDF
nonparametrictestresearchmethodology.pdf
PPTX
Test of significance
Tests of Significance.pptx powerpoint presentation
Lecture 3 about it governanace and how it works
Repeated Measure of three groups in ANOVA.pdf
Medical Statistics Pt 2
Application of Multivariate Regression Analysis and Analysis of Variance
Clinicaldataanalysis in r
Foundations of Statistics for Ecology and Evolution. 5. Linear Models
6Tests of significance Parametric and Non Parametric tests.ppt
[Q1~12]Aclothingstoreisconsideringtwomethodstoreducetheselosses1).docx
20231 MCHA022 (Analytical Chemistry 2).pdf
Epidemiological study design and it's significance
Statistical methods for research scholars (cd)
non parametric tests.pptx
Test of significance
A First Course in experimental design and analysis.pdf
Non parametric tests
nonparametrictestresearchmethodology.pdf
Test of significance
Ad

Recently uploaded (20)

PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
2. Earth - The Living Planet earth and life
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
famous lake in india and its disturibution and importance
PPTX
Microbiology with diagram medical studies .pptx
Introduction to Cardiovascular system_structure and functions-1
Placing the Near-Earth Object Impact Probability in Context
2. Earth - The Living Planet earth and life
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Comparative Structure of Integument in Vertebrates.pptx
The KM-GBF monitoring framework – status & key messages.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
The scientific heritage No 166 (166) (2025)
POSITIONING IN OPERATION THEATRE ROOM.ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
. Radiology Case Scenariosssssssssssssss
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Phytochemical Investigation of Miliusa longipes.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
famous lake in india and its disturibution and importance
Microbiology with diagram medical studies .pptx

Tryptone task

  • 1. The Tryptone Task Group 7 Yuwu Chen Alfonso R Croeze
  • 2. Introduction  Staphylococcus aureus is a bacterium, commonly found on skin and in the respiratory tract, that can cause ailments such as skin infections and respiratory diseases.  Like other bacteria, Staphylococcus aureus can be grown in medical laboratories to aid in identifying and treating skin conditions.  Poor growth rates of Methicillin resistant Staphylococcus aureus (MRSA) in one laboratory prompted the investigators to experiment with different culturing conditions.  Five strains of MRSA were examined in this experiment. Due to their complex names, they are referred to as 1, 2, 3, 4, and 5 in the data.
  • 3. Data Description  The tryptone dataset contains bacteria counts after the culturing of five strains of Staphylococcus aureus.  The data was collected by Gavin Cooper at the Auckland University of Technology, New Zealand. The full dataset: http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt  No missing values.  Tests on (a) factorial models with interactions to identify significant factors, (b) optimal conditions estimated by partial differentiation.
  • 4. Data Description  Treatments:  Time - In hours: 24 and 48  Temperature - Temperature of incubation in degrees Celcius: 27, 35, 43  Concentration - The concentration of the nutrient tryptone as a percentage: 0.6, 0.8, 1.0, 1.2, 1.4  Block:  Count column - Five count columns: 1, 2, 3, 4, 5  Redundant variable:  Row - this is the case number  Response (dependent) variable:  Strain counts - Bacteria counts: 3 to 284
  • 5. Data Management  Data transformation The original dataset shows aspects of both multivariate data, where the count column variable is arranged in columns, and univariate data, where the levels of the time, temperature and concentration variables respectively are listed in three columns. Row Count1 Count2 Count3 Count4 Count5 Time Temp Conc 1 9 3 10 14 33 24 27 0.6 2 16 12 26 20 31 24 27 0.8 Strain counts, which are analyzed in a univariate procedure, are recorded in different count columns: they must be placed in a single column. The count column variable should be in its own single column as well. Data was transformed by SAS code: Input row count1 count2 count3 count4 count5 time temp conc; column = 1; count = count1; output strain; column = 2; count = count2; output strain; The new dataset: The new dataset strain and the complete SAS code are in the output files. Obs time temp conc column count 1 24 27 0.6 1 9 2 24 27 0.6 2 3 3 24 27 0.6 3 10
  • 6. Data Management  Balance check: When fixing the treatment “time”, the tables below demonstrate that all 12 combinations of the other two treatments exist, and that the frequency of replicates in each combination is the same. Similarly, when fixing variable concentration or temperature, the frequency tables show that the experiment is balanced. (These results are shown in the output files.)  α = 0.05 is used for the entire analysis. Table 1 of temp by conc Controlling for time=24 temp conc Frequency 0.6 0.8 1 1.2 1.4 Total 27 5 5 5 5 5 25 35 5 5 5 5 5 25 43 5 5 5 5 5 25 Total 15 15 15 15 15 75 Table 2 of temp by conc Controlling for time=48 temp conc Frequency 0.6 0.8 1 1.2 1.4 Total 27 5 5 5 5 5 25 35 5 5 5 5 5 25 43 5 5 5 5 5 25 Total 15 15 15 15 15 75
  • 7. Data Summary  Differences in means? Symmetric data? Homogeneous variances? Figures below (left to right): distribution of count by time, temperature and concentration.  First impressions from the box plots:  In each treatment, means at different levels are quite different.  In temperature treatments, the data is less symmetric, so possibly not normal. The other two treatments looks more symmetric.  In each treatment, the variances may not be equal to each other.
  • 8. Method Description  Step 1: Test on factorial models with interactions to identify significant factors.  ANOVA test on factorial RBD, full model: The variances are separated.  ANOVA test on factorial RBD, reduced model: Homogeneous variance is assumed and the variance is pooled.  Step 2: Test for optimal conditions estimated by partial differentiation.  Multiple polynomial regression  The current protocols for culturing this bacteria have the time at 24 hours, the temperature at 35 degrees Celsius and the tryptone concentration at 1.0%.
  • 9. Step 1: Test on factorial models with interactions to identify significant factors  Full model vs. reduced model: which one is better? Fit Statistics -2 Res Log Likelihood 1107.3 AIC (Smaller is Better) 1169.3 AICC (Smaller is Better) 1191.9 BIC (Smaller is Better) 1157.2 Full model: Reduced model: Fit Statistics -2 Res Log Likelihood 1148.5 AIC (Smaller is Better) 1152.5 AICC (Smaller is Better) 1152.6 BIC (Smaller is Better) 1151.7  The reduced model has the smaller AIC value, which indicates that it is the better model.  The sources of variation and degrees of freedom:  Assumptions: Independence, normal distribution of residuals, homogeneity of variances Source degrees of freedom d.f. Tmt1 (Time) t1-1 1 Tmt2 (Temperature) t2-1 2 Tmt3 (Concentration) t3-1 4 Block (Count column) b-1 4 Interaction1 (Tmt1 * Tmt2) (t1-1)(t2-1) 2 Interaction2 (Tmt1 * Tmt3) (t1-1)(t3-1) 4 Interaction3 (Tmt1 * Tmt2) (t2-1)(t3-1) 8 Interaction4 (Tmt1 * Tmt2 * Tmt3) (t1-1)(t2-1)(t3-1) 8 Experimental Error (b-1)[(t1-1) + (t2-1) + (t3-1) (t1-1)(t2-1) + (t1-1)(t2-1) + (t1-1)(t2-1) + (t1-1)(t2-1)(t3-1)] 116 Total bt1t2t3-1 149 Block interactions are pooled into a single error term because of the assumption of no block interaction in RBD
  • 10. ANOVA Test on factorial RBD, reduced model  Yes, as p-values of all three treatments are <0.05, we reject H0: μ1 = μ2=…= μt in each treatment.  According to the factorial RBD (reduced) model, do different levels in each treatment have significantly different effects on strain counts? Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F time 1 116 444.27 <.0001 temp 2 116 80.12 <.0001 conc 4 116 64.86 <.0001  Is there interaction between treatments? Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F time*temp 2 116 38.07 <.0001 time*conc 4 116 3.99 0.0046 temp*conc 8 116 0.85 0.5613 time*temp*conc 8 116 2.17 0.0343  The hypothesis of no significant interaction effect between time & temp was rejected.  The hypothesis of no significant interaction effect between time & conc was rejected.  The hypothesis of no significant interaction effect between temp & conc was NOT rejected.  The hypothesis of no significant interaction effect between three treatments was
  • 11. ANOVA Test on factorial RBD, reduced model  Saxton’s Macro was applied to do a range test with the LSMeans output. e.g.:  Least Squares Means table gives the least squares estimate, the standard error of the estimate, etc.:  Which pairs of means in the one treatment are different, at a certain condition of other treatment levels?  Pairwise comparisons with TUKEY adjustments are shown in the “Differences of Least Squares Means” table. Least Squares Means Effect time temp conc Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper time 24 82.2800 3.4399 116 23.92 <.0001 0.05 75.4668 89.0932 time 48 162.75 3.4399 116 47.31 <.0001 0.05 155.93 169.56 temp 27 91.1200 3.9340 116 23.16 <.0001 0.05 83.3281 98.9119 Obs time temp conc Estimate Standard Error Alpha Lower Upper Letter Group 1 48 _ _ 162.75 3.4399 0.05 155.93 169.56 A 2 24 _ _ 82.2800 3.4399 0.05 75.4668 89.0932 B Effect=time Method=Tukey-Kramer(P<0.05) Set=1 The complete tables mentioned above are available in the output file.
  • 12. ANOVA Test on factorial RBD, reduced model  Last part of the ANOVA is testing the hypothesis of normality:  P-value >0.05, so we fail to reject the hypothesis of normality in the residual distribution.  Contrasts to test linear/curved trend  Temperature and concentration treatments are quantitative and equally spaced, having 3 levels and 5 levels respectively. (Time has only 2 levels)  The results of the contrasts indicate that both linear and curved models can fit the data. Contrasts Label Num DF Den DF F Value Pr > F linear 1 116 57.32 <.0001 quadratic 1 116 102.93 <.0001 linear 1 116 189.36 <.0001 quadratic 1 116 19.69 <.0001 cubic 1 116 32.80 <.0001 quartic 1 116 17.59 <.0001 First two rows are test results for the treatment Temp. Last four rows are test results for the treatment Conc. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.988251 Pr < W 0.2392 Kolmogorov-Smirnov D 0.050081 Pr > D >0.1500 Cramer-von Mises W-Sq 0.040777 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.333665 Pr > A-Sq >0.2500
  • 13. Step 2: Test for optimal conditions estimated by partial differentiation  Multiple polynomial regression  Three simple polynomial regressions are done separately, each treatment with one polynomial regression.  Sequentially adjusted Type I SS were used to determine whether the polynomial model is as good as the one with a higher order term.  Regression model: Y = β0 + β1 Xi + β2 X2i +…+ βk Xki + ei  Based on the regression model, partial differentiation is used to determine the optimal conditions. (Not displayed in this presentation.)  Also, the fit plots are useful in finding the maxima.  Assumptions: Independence, normal distribution of residuals, homogeneity of variances
  • 14. Polynomial regression with “Time”  Is the linear effect significant?  Fit plot (count vs time)  Time has only 2 levels, fit with a linear model. Source DF Type I SS Mean Square F Value Pr > F time 1 242808.1667 242808.1667 99.48 <.0001  Yes: p-value for linear <0.05, reject H0: β1 = 0.
  • 15. Polynomial regression with “Time”  Polynomial regression model  Normality test: p-value <0.05, reject the hypothesis of normality Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.97416 Pr < W 0.0063 Kolmogorov-Smirnov D 0.064387 Pr > D 0.1302 Cramer-von Mises W-Sq 0.15658 Pr > W-Sq 0.0204 Anderson-Darling A-Sq 1.023263 Pr > A-Sq 0.0106 Parameter Estimate Standard Error t Value Pr > |t| Intercept 1.813333333 12.75597911 0.14 0.8872 time 3.352777778 0.33614956 9.97 <.0001 Count = 1.813 + 3.352*Time According to the regression model, the strain count increases with the time increase: 48 hours might get a higher strain count than 24 hours. The current protocol for culturing this bacteria has the time at 24 hours, so the statistical results do NOT support this protocol.
  • 16. Polynomial regression with “Temperature”  Is the quadratic effect significant?  Fit plot (count vs. temperature)  Temperature has 3 levels, so it is fit with a quadratic model.  Yes: p-value for quadratic <0.05, reject H0: β2 = 0. Source DF Type I SS Mean Square F Value Pr > F temp 1 31329.00000 31329.00000 8.92 0.0033 temp*temp 1 56252.21333 56252.21333 16.01 <.0001
  • 17. Polynomial regression with “Temperature”  Polynomial regression model  Normality test: p-value <0.05, so we reject the hypothesis of normality Count = -713.834 + 47.144*Temp – 0.642*Temp2 According to the regression model, the strain count has a maximum at Temp = 35 degrees. The current protocol for culturing this bacteria has the temperature at 35 degrees, so the results support this protocol. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.966924 Pr < W 0.0011 Kolmogorov-Smirnov D 0.067926 Pr > D 0.0888 Cramer-von Mises W-Sq 0.182315 Pr > W-Sq 0.0089 Anderson-Darling A-Sq 1.229754 Pr > A-Sq <0.0050 Parameter Estimate Standard Error t Value Pr > |t| Intercept -713.8343750 191.4866910 -3.73 0.0003 temp 47.1437500 11.2532848 4.19 <.0001 temp*temp -0.6418750 0.1604124 -4.00 <.0001
  • 18. Polynomial regression with “Concentration”  Is the quartic effect significant?  Temperature has 5 levels, so we fit it with a quartic model.  No: p-value for quartic >0.05, do not reject H0: β4 = 0. Source DF Type I SS Mean Square F Value Pr > F conc 1 103490.6133 103490.6133 32.46 <.0001 conc*conc 1 10761.6095 10761.6095 3.38 0.0682 conc*conc*conc 1 17925.8700 17925.8700 5.62 0.0190 conc*conc*conc*conc 1 9612.8805 9612.8805 3.02 0.0846
  • 19. Polynomial regression with “Concentration”  Is the cubic effect significant?  Fit plot (count vs. concentration)  Now fit it with a cubic model.  Yes: p-value for quartic <0.05, reject H0: β3 = 0. Source DF Type I SS Mean Square F Value Pr > F conc 1 103490.6133 103490.6133 32.02 <.0001 conc*conc 1 10761.6095 10761.6095 3.33 0.0701 conc*conc*conc 1 17925.8700 17925.8700 5.55 0.0198
  • 20. Polynomial regression with “Concentration”  Polynomial regression model  Normality test: p-value <0.05, reject the hypothesis of normality Count = 608.923 – 1960.155*Conc + 2289.077*Conc2 – 805.208*Conc3 According to the regression model, the strain count has a maximum at Conc = 1.2%. The current protocol for culturing this bacteria has the concentration at 1.0%, so the results do NOT support this protocol. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.978016 Pr < W 0.0166 Kolmogorov-Smirnov D 0.069177 Pr > D 0.0787 Cramer-von Mises W-Sq 0.161641 Pr > W-Sq 0.0179 Anderson-Darling A-Sq 1.095717 Pr > A-Sq 0.0073 Parameter Estimate Standard Error t Value Pr > |t| Intercept 608.922857 302.692620 2.01 0.0461 conc -1960.154762 989.107532 -1.98 0.0494 conc*conc 2289.077381 1028.037019 2.23 0.0275 conc*conc*conc -805.208333 341.898415 -2.36 0.0198
  • 21. Conclusion  Polynomial regression models support the temperature in the current protocol for culturing Staphylococcus aureus. However, the models do not support the time and concentration in the protocol.  An ANOVA test on the factorial RBD was done, and the reduced model is better. Different levels in each treatment have significantly different effects on strain counts. There is a significant interaction effect between temperature & concentration. Other pair- wise comparisons can be found in the output.  The polynomial regression models did not meet the assumption of normality according to the Shapiro-Wilk criteria (although they do according to the Kolmogorov-Smirnov criteria). This might make the data analysis less reliable.
  • 22. Reference “Using EDA, ANOVA and Regression to Optimize some Microbiology Data.” Journal of Statistics Education, Volume 12, Number 2 (July 2004) http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html