Tryptone task

The Tryptone Task
Group 7
Yuwu Chen
Alfonso R Croeze

Introduction
 Staphylococcus aureus is a bacterium, commonly found on skin and in the
respiratory tract, that can cause ailments such as skin infections and respiratory
diseases.
 Like other bacteria, Staphylococcus aureus can be grown in medical laboratories
to aid in identifying and treating skin conditions.
 Poor growth rates of Methicillin resistant Staphylococcus aureus (MRSA) in one
laboratory prompted the investigators to experiment with different culturing
conditions.
 Five strains of MRSA were examined in this experiment. Due to their complex
names, they are referred to as 1, 2, 3, 4, and 5 in the data.

Data Description
 The tryptone dataset contains bacteria counts after the culturing of five strains
of Staphylococcus aureus.
 The data was collected by Gavin Cooper at the Auckland University of
Technology, New Zealand. The full dataset:
http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt
 No missing values.
 Tests on (a) factorial models with interactions to identify significant factors,
(b) optimal conditions estimated by partial differentiation.

Data Description
 Treatments:
 Time - In hours: 24 and 48
 Temperature - Temperature of incubation in degrees Celcius: 27, 35, 43
 Concentration - The concentration of the nutrient tryptone as a
percentage: 0.6, 0.8, 1.0, 1.2, 1.4
 Block:
 Count column - Five count columns: 1, 2, 3, 4, 5
 Redundant variable:
 Row - this is the case number
 Response (dependent) variable:
 Strain counts - Bacteria counts: 3 to 284

Data Management
 Data transformation
The original dataset shows aspects of both multivariate data, where the count
column variable is arranged in columns, and univariate data, where the levels of
the time, temperature and concentration variables respectively are listed in three
columns.
Row Count1 Count2 Count3 Count4 Count5 Time Temp Conc
1 9 3 10 14 33 24 27 0.6
2 16 12 26 20 31 24 27 0.8
Strain counts, which are analyzed in a univariate procedure, are recorded in
different count columns: they must be placed in a single column. The count
column variable should be in its own single column as well.
Data was transformed by SAS code:
Input row count1 count2 count3 count4 count5 time temp conc;
column = 1; count = count1; output strain;
column = 2; count = count2; output strain;
The new dataset:
The new dataset strain and the complete SAS code are in the output files.
Obs time temp conc column count
1 24 27 0.6 1 9
2 24 27 0.6 2 3
3 24 27 0.6 3 10

Data Management
 Balance check:
When fixing the treatment “time”, the tables below demonstrate that all 12
combinations of the other two treatments exist, and that the frequency of
replicates in each combination is the same.
Similarly, when fixing variable concentration or temperature, the frequency
tables show that the experiment is balanced. (These results are shown in the
output files.)
 α = 0.05 is used for the entire analysis.
Table 1 of temp by conc
Controlling for time=24
temp conc
Frequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75
Table 2 of temp by conc
Controlling for time=48
temp conc
Frequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75

Data Summary
 Differences in means? Symmetric data? Homogeneous variances?
Figures below (left to right): distribution of count by time, temperature and
concentration.
 First impressions from the box plots:
 In each treatment, means at different levels are quite different.
 In temperature treatments, the data is less symmetric, so possibly not normal.
The other two treatments looks more symmetric.
 In each treatment, the variances may not be equal to each other.

Method Description
 Step 1: Test on factorial models with interactions to identify significant factors.
 ANOVA test on factorial RBD, full model:
The variances are separated.
 ANOVA test on factorial RBD, reduced model:
Homogeneous variance is assumed and the variance is pooled.
 Step 2: Test for optimal conditions estimated by partial differentiation.
 Multiple polynomial regression
 The current protocols for culturing this bacteria have the time at 24 hours, the
temperature at 35 degrees Celsius and the tryptone concentration at 1.0%.

Step 1: Test on factorial models with
interactions to identify significant factors
 Full model vs. reduced model: which one is better?
Fit Statistics
-2 Res Log Likelihood 1107.3
AIC (Smaller is Better) 1169.3
AICC (Smaller is Better) 1191.9
BIC (Smaller is Better) 1157.2
Full model: Reduced
model:
Fit Statistics
-2 Res Log Likelihood 1148.5
AIC (Smaller is Better) 1152.5
AICC (Smaller is Better) 1152.6
BIC (Smaller is Better) 1151.7
 The reduced model has the smaller AIC value, which indicates that it is the better
model.
 The sources of variation and degrees of freedom:
 Assumptions: Independence, normal distribution of residuals, homogeneity of
variances
Source degrees of freedom d.f.
Tmt1 (Time) t1-1 1
Tmt2 (Temperature) t2-1 2
Tmt3 (Concentration) t3-1 4
Block (Count column) b-1 4
Interaction1 (Tmt1 * Tmt2) (t1-1)(t2-1) 2
Interaction4 (Tmt1 * Tmt2 * Tmt3) (t1-1)(t2-1)(t3-1) 8
Experimental Error (b-1)[(t1-1) + (t2-1) + (t3-1) (t1-1)(t2-1) + (t1-1)(t2-1)
+ (t1-1)(t2-1) + (t1-1)(t2-1)(t3-1)]
116
Total bt1t2t3-1 149
Block interactions are pooled into a single error term because of the assumption of no block interaction in RBD

ANOVA Test on factorial RBD, reduced
model
 Yes, as p-values of all three treatments are <0.05, we reject H0: μ1 = μ2=…= μt in
each treatment.
 According to the factorial RBD (reduced) model, do different levels in each
treatment have significantly different effects on strain counts?
Type 3 Tests of Fixed Effects
Effect
Num
DF
Den
DF F Value Pr > F
time 1 116 444.27 <.0001
temp 2 116 80.12 <.0001
conc 4 116 64.86 <.0001
 Is there interaction between treatments?
Type 3 Tests of Fixed Effects
Effect
Num
DF
Den
DF F Value Pr > F
time*temp 2 116 38.07 <.0001
time*conc 4 116 3.99 0.0046
temp*conc 8 116 0.85 0.5613
time*temp*conc 8 116 2.17 0.0343
 The hypothesis of no significant interaction effect between time & temp was rejected.
 The hypothesis of no significant interaction effect between time & conc was rejected.
 The hypothesis of no significant interaction effect between temp & conc was NOT
rejected.
 The hypothesis of no significant interaction effect between three treatments was

model
 Saxton’s Macro was applied to do a range test with the LSMeans output. e.g.:
 Least Squares Means table gives the least squares estimate, the standard error of
the estimate, etc.:
 Which pairs of means in the one treatment are different, at a certain condition of
other treatment levels?
 Pairwise comparisons with TUKEY adjustments are shown in the “Differences
of Least Squares Means” table.
Least Squares Means
Effect time temp conc Estimate
Standard
Error DF t Value Pr > |t| Alpha Lower Upper
time 24 82.2800 3.4399 116 23.92 <.0001 0.05 75.4668 89.0932
time 48 162.75 3.4399 116 47.31 <.0001 0.05 155.93 169.56
temp 27 91.1200 3.9340 116 23.16 <.0001 0.05 83.3281 98.9119
Obs time temp conc Estimate
Standard
Error Alpha Lower Upper
Letter
Group
1 48 _ _ 162.75 3.4399 0.05 155.93 169.56 A
2 24 _ _ 82.2800 3.4399 0.05 75.4668 89.0932 B
Effect=time Method=Tukey-Kramer(P<0.05) Set=1
The complete tables mentioned above are available in the output file.

model
 Last part of the ANOVA is testing the hypothesis of normality:
 P-value >0.05, so we fail to reject the hypothesis of normality in the residual
distribution.
 Contrasts to test linear/curved trend
 Temperature and concentration treatments are quantitative and equally spaced,
having 3 levels and 5 levels respectively. (Time has only 2 levels)
 The results of the contrasts indicate that both linear and curved models can fit the
data.
Contrasts
Label
Num
DF
Den
DF F Value Pr > F
linear 1 116 57.32 <.0001
quadratic 1 116 102.93 <.0001
linear 1 116 189.36 <.0001
quadratic 1 116 19.69 <.0001
cubic 1 116 32.80 <.0001
quartic 1 116 17.59 <.0001
First two rows are test results for the treatment Temp.
Last four rows are test results for the treatment Conc.
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.988251 Pr < W 0.2392
Kolmogorov-Smirnov D 0.050081 Pr > D >0.1500
Cramer-von Mises W-Sq 0.040777 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.333665 Pr > A-Sq >0.2500

Step 2: Test for optimal conditions estimated
by partial differentiation
 Multiple polynomial regression
 Three simple polynomial regressions are done separately, each treatment with
one polynomial regression.
 Sequentially adjusted Type I SS were used to determine whether the
polynomial model is as good as the one with a higher order term.
 Regression model:
Y = β0 + β1 Xi + β2 X2i +…+ βk Xki + ei
 Based on the regression model, partial differentiation is used to determine the
optimal conditions. (Not displayed in this presentation.)
 Also, the fit plots are useful in finding the maxima.
 Assumptions: Independence, normal distribution of residuals, homogeneity of
variances

Polynomial regression with “Time”
 Is the linear effect significant?
 Fit plot (count vs time)
 Time has only 2 levels, fit with a linear model.
Source DF Type I SS Mean Square F Value Pr > F
time 1 242808.1667 242808.1667 99.48 <.0001
 Yes: p-value for linear <0.05, reject H0: β1 = 0.

Polynomial regression with “Time”
 Polynomial regression model
 Normality test: p-value <0.05, reject the hypothesis of normality
Tests for Normality
Kolmogorov-Smirnov D 0.064387 Pr > D 0.1302
Cramer-von Mises W-Sq 0.15658 Pr > W-Sq 0.0204
Anderson-Darling A-Sq 1.023263 Pr > A-Sq 0.0106
Parameter Estimate
Standard
Error t Value Pr > |t|
Intercept 1.813333333 12.75597911 0.14 0.8872
time 3.352777778 0.33614956 9.97 <.0001
Count = 1.813 + 3.352*Time
According to the regression model, the strain count increases with the time increase:
48 hours might get a higher strain count than 24 hours. The current protocol for
culturing this bacteria has the time at 24 hours, so the statistical results do NOT
support this protocol.

Polynomial regression with
“Temperature”
 Is the quadratic effect significant?
 Fit plot (count vs. temperature)
 Temperature has 3 levels, so it is fit with a quadratic model.
 Yes: p-value for quadratic <0.05, reject H0: β2 = 0.
temp 1 31329.00000 31329.00000 8.92 0.0033
temp*temp 1 56252.21333 56252.21333 16.01 <.0001

“Temperature”
 Normality test: p-value <0.05, so we reject the hypothesis of normality
Count = -713.834 + 47.144*Temp – 0.642*Temp2
According to the regression model, the strain count has a maximum at Temp = 35
degrees. The current protocol for culturing this bacteria has the temperature at 35
degrees, so the results support this protocol.
Tests for Normality
Anderson-Darling A-Sq 1.229754 Pr > A-Sq <0.0050
Parameter Estimate
Standard
Intercept -713.8343750 191.4866910 -3.73 0.0003
temp 47.1437500 11.2532848 4.19 <.0001
temp*temp -0.6418750 0.1604124 -4.00 <.0001

“Concentration”
 Is the quartic effect significant?
 Temperature has 5 levels, so we fit it with a quartic model.
 No: p-value for quartic >0.05, do not reject H0: β4 = 0.
conc 1 103490.6133 103490.6133 32.46 <.0001
conc*conc 1 10761.6095 10761.6095 3.38 0.0682
conc*conc*conc 1 17925.8700 17925.8700 5.62 0.0190
conc*conc*conc*conc 1 9612.8805 9612.8805 3.02 0.0846

“Concentration”
 Is the cubic effect significant?
 Fit plot (count vs. concentration)
 Now fit it with a cubic model.
 Yes: p-value for quartic <0.05, reject H0: β3 = 0.
conc 1 103490.6133 103490.6133 32.02 <.0001
conc*conc 1 10761.6095 10761.6095 3.33 0.0701
conc*conc*conc 1 17925.8700 17925.8700 5.55 0.0198

“Concentration”
 Normality test: p-value <0.05, reject the hypothesis of normality
Count = 608.923 – 1960.155*Conc + 2289.077*Conc2 – 805.208*Conc3
According to the regression model, the strain count has a maximum at Conc = 1.2%.
The current protocol for culturing this bacteria has the concentration at 1.0%, so the
results do NOT support this protocol.
Tests for Normality
Anderson-Darling A-Sq 1.095717 Pr > A-Sq 0.0073
Parameter Estimate
Standard
Intercept 608.922857 302.692620 2.01 0.0461
conc -1960.154762 989.107532 -1.98 0.0494
conc*conc 2289.077381 1028.037019 2.23 0.0275
conc*conc*conc -805.208333 341.898415 -2.36 0.0198

Conclusion
 Polynomial regression models support the temperature in the current protocol for
culturing Staphylococcus aureus. However, the models do not support the time and
concentration in the protocol.
 An ANOVA test on the factorial RBD was done, and the reduced model is better.
Different levels in each treatment have significantly different effects on strain counts.
There is a significant interaction effect between temperature & concentration. Other pair-
wise comparisons can be found in the output.
 The polynomial regression models did not meet the assumption of normality
according to the Shapiro-Wilk criteria (although they do according to the
Kolmogorov-Smirnov criteria). This might make the data analysis less reliable.

Reference
“Using EDA, ANOVA and Regression to Optimize some Microbiology Data.”
Journal of Statistics Education, Volume 12, Number 2 (July 2004)
http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html

Tryptone task

More Related Content

What's hot (20)

Similar to Tryptone task (20)

Recently uploaded (20)

Tryptone task