R code descriptive statistics of phenotypic data by Avjinder Kaler

R Code
for
Descriptive
Statistics of
Phenotypic Data
Avjinder Singh Kaler

Steps
1. Reading data into R—the data file includes a header and
“NA” is a missing value.
data_in <‐‐‐ read.table (file="example.dat", header= T, na.string="NA")
2. Getting access to the data frame (all variables will relate to
this data frame).
attach(data_in)
3. Overview of data (mean, median, maximum, minimum, first
and third quartiles, number of missing values).
summary(data_in)
4. Calculating means and standard deviations (2 indicates that
the function is applied to all columns, na.rm=T means that
missing values are removed).
apply(data_in,2,mean,na.rm=T)
apply(data_in,2,sd,na.rm=T)
5. Means or standard deviations can be calculated for data
points grouped by a factor (e.g. year).
aggregate(data_in,list(data$year),FUN=mean)
aggregate(data_in,list(data$year),FUN=sd)
6. Frequencies for a single variable and across two variables.
table(sex)
table(wormy)
table(wormy,sex)

7. Histogram.
hist(FWEC)
8. xy scatter plots.
plot(FWEC)
9. Box plot.
boxplot(FWEC_sex,data=data_in, range=0)
10. Shapiro–Wilk’s tests to check normality of data
distribution.
shapiro.test(FWEC)
11. Checking data distribution with QQ plot—if data are
normally distributed, the plotted data and the line are well
aligned.
qqplot(FWEC)
qqline(FWEC)
12. Data transformation—log, square root, and cube root
transformation.
log_FWEC <‐‐‐ log(FWEC)
sqrt_FWEC <‐‐‐ sqrt(FWEC+1)
cbrt_FWEC <‐‐‐ (FWEC)^(1/3)

13. Box–Cox transformation.
#CodetofindsuitablelambdaforYtothepowerlambda
#download thelibrary(MASS)
#seq(min value, max value, step) defines the range
from which lambda is drawn
boxcox(FWEC_factor(sex)+factor(birth_rearing_
type), lambda = seq(0,1.0,0.01)
savePlot("boxcox","jpeg")
lambda = "insert maximum lambda value in graph here"
trans(FWEC) <- ((FWEC^lambda)-1)/lambda MASS library
14. Checking homogeneity of variances.
#download library (Rcmdr)
library(Rcmdr)
#run the Leven’s test, specifying the vector of
data y and group, the factor across which the variances
are tested (e.g., year)
leveneTest(y,group)
15. Fitting a linear model and ANOVA.
#need to load the "car" package for Type III ANOVA
library(car)

lmod <- lm(cbrt_FWEC_factor(sex))
#Type I ANOVA
anova(lm)
#Type III ANOVA---Note that the first letter in the
commandbelow has to be a capital "A" (ensure that
you loaded the "car" package as shown above)
Anova(lmod, type¼"III")
16. Addressing confounding of explanatory variables in a linear
model.
lmod1 <- lm(cbrt_FWEC_factor(sex)+factor(birth_
type)*factor(rearing_type))
lmod2 <-lm(cbrt_FWEC_factor(sex)+factor(birth_
rearing_type))
17. Check the difference with an ANOVA.
Anova(lmod1,type="III")
Anova(lmod2,type="III")
18. Model comparison using logistic regression for binary data.
logres <- glm(formula=wormy_status_factor(sex) +
factor(birth_rearing_type), family = binomial
(link="logit"))

#producing an analysis-of-deviance table to test
fixed effects
anova(logres,test="Chisq")
#produces the deviance of the model (the lower the
better the fit)
summary(glm(formula=wormy_status_factor(sex) +
factor(birth_rearing_type), family = binomial
(link="logit"))$deviance))
#the difference in deviance can be formally tested
with a loglikelihood ratio test
#install library(lme4)
library(lme4)
#comparing two nested models ("nested" means that
one has one more factor than the other)
logres1 <- lmer(wormy_status_factor(sex)), family
= "binomial", method="Laplace")
logres2 <- lmer(wormy_status_factor(sex) + factor
(birth_rearing_type), family = "binomial",
Method="Laplace")
anova(logres1,logres2)
#to assess the model, plot predicted probability
against observed proportion
#install library(languageR)
library(languageR)
plot.logistics.fit.fnc(logres1,logres2)

19. Model diagnostics.
#the following produces plot of residual vs. fitted
value, QQ plot, and scale-location plot of the
previously tested model 1 (lmod1)
plot(lmod1)
#assessing a logit model for binary data by plotting
the predicted probability against observed
proportions
#download library(languageR)
library(languageR)
plot.logistic.fit.fnc(logres1,data_in)
20. Extracting residuals and writing them to a file—assuming
lmod2 is the model of choice.
res_lmod2 <-residuals(lmod2)
write.table(res_lmod2,file¼"res_FWEC")

R code descriptive statistics of phenotypic data by Avjinder Kaler

More Related Content

What's hot (19)

Viewers also liked (9)

Similar to R code descriptive statistics of phenotypic data by Avjinder Kaler (20)

More from Avjinder (Avi) Kaler (19)

Recently uploaded (20)

R code descriptive statistics of phenotypic data by Avjinder Kaler