SlideShare a Scribd company logo
R Code
for
Descriptive
Statistics of
Phenotypic Data
Avjinder Singh Kaler
Steps
1. Reading data into R—the data file includes a header and
“NA” is a missing value.
data_in <‐‐‐ read.table (file="example.dat", header= T, na.string="NA")
2. Getting access to the data frame (all variables will relate to
this data frame).
attach(data_in)
3. Overview of data (mean, median, maximum, minimum, first
and third quartiles, number of missing values).
summary(data_in)
4. Calculating means and standard deviations (2 indicates that
the function is applied to all columns, na.rm=T means that
missing values are removed).
apply(data_in,2,mean,na.rm=T)
apply(data_in,2,sd,na.rm=T)
5. Means or standard deviations can be calculated for data
points grouped by a factor (e.g. year).
aggregate(data_in,list(data$year),FUN=mean)
aggregate(data_in,list(data$year),FUN=sd)
6. Frequencies for a single variable and across two variables.
table(sex)
table(wormy)
table(wormy,sex)
7. Histogram.
hist(FWEC)
8. xy scatter plots.
plot(FWEC)
9. Box plot.
boxplot(FWEC_sex,data=data_in, range=0)
10. Shapiro–Wilk’s tests to check normality of data
distribution.
shapiro.test(FWEC)
11. Checking data distribution with QQ plot—if data are
normally distributed, the plotted data and the line are well
aligned.
qqplot(FWEC)
qqline(FWEC)
12. Data transformation—log, square root, and cube root
transformation.
log_FWEC <‐‐‐ log(FWEC)
sqrt_FWEC <‐‐‐ sqrt(FWEC+1)
cbrt_FWEC <‐‐‐ (FWEC)^(1/3)
13. Box–Cox transformation.
#CodetofindsuitablelambdaforYtothepowerlambda
#download thelibrary(MASS)
#seq(min value, max value, step) defines the range
from which lambda is drawn
boxcox(FWEC_factor(sex)+factor(birth_rearing_
type), lambda = seq(0,1.0,0.01)
savePlot("boxcox","jpeg")
lambda = "insert maximum lambda value in graph here"
trans(FWEC) <- ((FWEC^lambda)-1)/lambda MASS library
14. Checking homogeneity of variances.
#download library (Rcmdr)
library(Rcmdr)
#run the Leven’s test, specifying the vector of
data y and group, the factor across which the variances
are tested (e.g., year)
leveneTest(y,group)
15. Fitting a linear model and ANOVA.
#need to load the "car" package for Type III ANOVA
library(car)
lmod <- lm(cbrt_FWEC_factor(sex))
#Type I ANOVA
anova(lm)
#Type III ANOVA---Note that the first letter in the
commandbelow has to be a capital "A" (ensure that
you loaded the "car" package as shown above)
Anova(lmod, type¼"III")
16. Addressing confounding of explanatory variables in a linear
model.
lmod1 <- lm(cbrt_FWEC_factor(sex)+factor(birth_
type)*factor(rearing_type))
lmod2 <-lm(cbrt_FWEC_factor(sex)+factor(birth_
rearing_type))
17. Check the difference with an ANOVA.
Anova(lmod1,type="III")
Anova(lmod2,type="III")
18. Model comparison using logistic regression for binary data.
logres <- glm(formula=wormy_status_factor(sex) +
factor(birth_rearing_type), family = binomial
(link="logit"))
#producing an analysis-of-deviance table to test
fixed effects
anova(logres,test="Chisq")
#produces the deviance of the model (the lower the
better the fit)
summary(glm(formula=wormy_status_factor(sex) +
factor(birth_rearing_type), family = binomial
(link="logit"))$deviance))
#the difference in deviance can be formally tested
with a loglikelihood ratio test
#install library(lme4)
library(lme4)
#comparing two nested models ("nested" means that
one has one more factor than the other)
logres1 <- lmer(wormy_status_factor(sex)), family
= "binomial", method="Laplace")
logres2 <- lmer(wormy_status_factor(sex) + factor
(birth_rearing_type), family = "binomial",
Method="Laplace")
anova(logres1,logres2)
#to assess the model, plot predicted probability
against observed proportion
#install library(languageR)
library(languageR)
plot.logistics.fit.fnc(logres1,logres2)
19. Model diagnostics.
#the following produces plot of residual vs. fitted
value, QQ plot, and scale-location plot of the
previously tested model 1 (lmod1)
plot(lmod1)
#assessing a logit model for binary data by plotting
the predicted probability against observed
proportions
#download library(languageR)
library(languageR)
plot.logistic.fit.fnc(logres1,data_in)
20. Extracting residuals and writing them to a file—assuming
lmod2 is the model of choice.
res_lmod2 <-residuals(lmod2)
write.table(res_lmod2,file¼"res_FWEC")

More Related Content

PDF
SAS and R Code for Basic Statistics
PDF
Basic Tutorial of Association Mapping by Avjinder Kaler
PDF
Genomic Selection with Bayesian Generalized Linear Regression model using R
PDF
R code for data manipulation
PDF
Tutorial for Estimating Broad and Narrow Sense Heritability using R
PDF
Transpose and manipulate character strings
PDF
Data manipulation on r
PDF
Manipulating data with dates
SAS and R Code for Basic Statistics
Basic Tutorial of Association Mapping by Avjinder Kaler
Genomic Selection with Bayesian Generalized Linear Regression model using R
R code for data manipulation
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Transpose and manipulate character strings
Data manipulation on r
Manipulating data with dates

What's hot (19)

PDF
Data handling in r
PDF
Handling Missing Values
PDF
Next Generation Programming in R
PDF
Data Visualization using base graphics
PDF
Manipulating Data using DPLYR in R Studio
PDF
Data manipulation with dplyr
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
PDF
Data tidying with tidyr meetup
PDF
R data-import, data-export
 
PDF
Export Data using R Studio
PDF
Stata Cheat Sheets (all)
PDF
Grouping & Summarizing Data in R
PPTX
Introduction To R Language
PPTX
R seminar dplyr package
PPTX
Merge Multiple CSV in single data frame using R
PPTX
5. working on data using R -Cleaning, filtering ,transformation, Sampling
PPTX
R language introduction
PDF
5 R Tutorial Data Visualization
PDF
Stata Programming Cheat Sheet
Data handling in r
Handling Missing Values
Next Generation Programming in R
Data Visualization using base graphics
Manipulating Data using DPLYR in R Studio
Data manipulation with dplyr
2. R-basics, Vectors, Arrays, Matrices, Factors
Data tidying with tidyr meetup
R data-import, data-export
 
Export Data using R Studio
Stata Cheat Sheets (all)
Grouping & Summarizing Data in R
Introduction To R Language
R seminar dplyr package
Merge Multiple CSV in single data frame using R
5. working on data using R -Cleaning, filtering ,transformation, Sampling
R language introduction
5 R Tutorial Data Visualization
Stata Programming Cheat Sheet
Ad

Viewers also liked (9)

PDF
Tutorial for Circular and Rectangular Manhattan plots
PDF
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
PDF
R code for data manipulation
PDF
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
PDF
SAS and R Code for Basic Statistics
PDF
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
PDF
Nutrient availability response to sulfur amendment in histosols having variab...
PDF
R Code for EM Algorithm
PDF
Seed rate calculation for experiment
Tutorial for Circular and Rectangular Manhattan plots
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
R code for data manipulation
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
SAS and R Code for Basic Statistics
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Nutrient availability response to sulfur amendment in histosols having variab...
R Code for EM Algorithm
Seed rate calculation for experiment
Ad

Similar to R code descriptive statistics of phenotypic data by Avjinder Kaler (20)

PDF
Rcommands-for those who interested in R.
PDF
R Cheat Sheet for Data Analysts and Statisticians.pdf
DOCX
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
PDF
How to manage your Experimental Protocol with Basic Statistics
DOCX
R Activity in Biostatistics
PDF
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
PDF
Reading Data into R
PDF
A First Course in experimental design and analysis.pdf
PDF
R introduction v2
PPTX
Some statistical concepts relevant to proteomics data analysis
PPT
IntroductionSTATA.ppt
PPTX
Introduction to Practical Biostatistics
PPT
R for Statistical Computing
KEY
Presentation R basic teaching module
PDF
Descriptive Statistics with R
PDF
Mth201 COMPLETE BOOK
PDF
Lecturenotesstatistics
PDF
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
PDF
Why are data transformations a bad choice in statistics
PDF
Factorial designs
Rcommands-for those who interested in R.
R Cheat Sheet for Data Analysts and Statisticians.pdf
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
How to manage your Experimental Protocol with Basic Statistics
R Activity in Biostatistics
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Reading Data into R
A First Course in experimental design and analysis.pdf
R introduction v2
Some statistical concepts relevant to proteomics data analysis
IntroductionSTATA.ppt
Introduction to Practical Biostatistics
R for Statistical Computing
Presentation R basic teaching module
Descriptive Statistics with R
Mth201 COMPLETE BOOK
Lecturenotesstatistics
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
Why are data transformations a bad choice in statistics
Factorial designs

More from Avjinder (Avi) Kaler (19)

PDF
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
PDF
Tutorial for Deep Learning Project with Keras
PDF
Tutorial for DBSCAN Clustering in Machine Learning
PDF
Python Code for Classification Supervised Machine Learning.pdf
PDF
Sql tutorial for select, where, order by, null, insert functions
PDF
Kaler et al 2018 euphytica
PDF
Association mapping identifies loci for canopy coverage in diverse soybean ge...
PDF
Genome wide association mapping
PDF
Population genetics
PDF
Quantitative genetics
PDF
Abiotic stresses in plant
PDF
Multiple linear regression
PDF
Correlation in Statistics
PDF
Simple linear regression
PDF
Analysis of Variance (ANOVA)
PDF
Population and sample mean
PDF
Descriptive statistics and graphs
PDF
Hypothesis and Test
PDF
Normal and standard normal distribution
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Tutorial for Deep Learning Project with Keras
Tutorial for DBSCAN Clustering in Machine Learning
Python Code for Classification Supervised Machine Learning.pdf
Sql tutorial for select, where, order by, null, insert functions
Kaler et al 2018 euphytica
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Genome wide association mapping
Population genetics
Quantitative genetics
Abiotic stresses in plant
Multiple linear regression
Correlation in Statistics
Simple linear regression
Analysis of Variance (ANOVA)
Population and sample mean
Descriptive statistics and graphs
Hypothesis and Test
Normal and standard normal distribution

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PPTX
Pharma ospi slides which help in ospi learning
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Pre independence Education in Inndia.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
01-Introduction-to-Information-Management.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Cell Structure & Organelles in detailed.
Pharma ospi slides which help in ospi learning
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet
2.FourierTransform-ShortQuestionswithAnswers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pre independence Education in Inndia.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
01-Introduction-to-Information-Management.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Renaissance Architecture: A Journey from Faith to Humanism
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Microbial diseases, their pathogenesis and prophylaxis
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

R code descriptive statistics of phenotypic data by Avjinder Kaler

  • 2. Steps 1. Reading data into R—the data file includes a header and “NA” is a missing value. data_in <‐‐‐ read.table (file="example.dat", header= T, na.string="NA") 2. Getting access to the data frame (all variables will relate to this data frame). attach(data_in) 3. Overview of data (mean, median, maximum, minimum, first and third quartiles, number of missing values). summary(data_in) 4. Calculating means and standard deviations (2 indicates that the function is applied to all columns, na.rm=T means that missing values are removed). apply(data_in,2,mean,na.rm=T) apply(data_in,2,sd,na.rm=T) 5. Means or standard deviations can be calculated for data points grouped by a factor (e.g. year). aggregate(data_in,list(data$year),FUN=mean) aggregate(data_in,list(data$year),FUN=sd) 6. Frequencies for a single variable and across two variables. table(sex) table(wormy) table(wormy,sex)
  • 3. 7. Histogram. hist(FWEC) 8. xy scatter plots. plot(FWEC) 9. Box plot. boxplot(FWEC_sex,data=data_in, range=0) 10. Shapiro–Wilk’s tests to check normality of data distribution. shapiro.test(FWEC) 11. Checking data distribution with QQ plot—if data are normally distributed, the plotted data and the line are well aligned. qqplot(FWEC) qqline(FWEC) 12. Data transformation—log, square root, and cube root transformation. log_FWEC <‐‐‐ log(FWEC) sqrt_FWEC <‐‐‐ sqrt(FWEC+1) cbrt_FWEC <‐‐‐ (FWEC)^(1/3)
  • 4. 13. Box–Cox transformation. #CodetofindsuitablelambdaforYtothepowerlambda #download thelibrary(MASS) #seq(min value, max value, step) defines the range from which lambda is drawn boxcox(FWEC_factor(sex)+factor(birth_rearing_ type), lambda = seq(0,1.0,0.01) savePlot("boxcox","jpeg") lambda = "insert maximum lambda value in graph here" trans(FWEC) <- ((FWEC^lambda)-1)/lambda MASS library 14. Checking homogeneity of variances. #download library (Rcmdr) library(Rcmdr) #run the Leven’s test, specifying the vector of data y and group, the factor across which the variances are tested (e.g., year) leveneTest(y,group) 15. Fitting a linear model and ANOVA. #need to load the "car" package for Type III ANOVA library(car)
  • 5. lmod <- lm(cbrt_FWEC_factor(sex)) #Type I ANOVA anova(lm) #Type III ANOVA---Note that the first letter in the commandbelow has to be a capital "A" (ensure that you loaded the "car" package as shown above) Anova(lmod, type¼"III") 16. Addressing confounding of explanatory variables in a linear model. lmod1 <- lm(cbrt_FWEC_factor(sex)+factor(birth_ type)*factor(rearing_type)) lmod2 <-lm(cbrt_FWEC_factor(sex)+factor(birth_ rearing_type)) 17. Check the difference with an ANOVA. Anova(lmod1,type="III") Anova(lmod2,type="III") 18. Model comparison using logistic regression for binary data. logres <- glm(formula=wormy_status_factor(sex) + factor(birth_rearing_type), family = binomial (link="logit"))
  • 6. #producing an analysis-of-deviance table to test fixed effects anova(logres,test="Chisq") #produces the deviance of the model (the lower the better the fit) summary(glm(formula=wormy_status_factor(sex) + factor(birth_rearing_type), family = binomial (link="logit"))$deviance)) #the difference in deviance can be formally tested with a loglikelihood ratio test #install library(lme4) library(lme4) #comparing two nested models ("nested" means that one has one more factor than the other) logres1 <- lmer(wormy_status_factor(sex)), family = "binomial", method="Laplace") logres2 <- lmer(wormy_status_factor(sex) + factor (birth_rearing_type), family = "binomial", Method="Laplace") anova(logres1,logres2) #to assess the model, plot predicted probability against observed proportion #install library(languageR) library(languageR) plot.logistics.fit.fnc(logres1,logres2)
  • 7. 19. Model diagnostics. #the following produces plot of residual vs. fitted value, QQ plot, and scale-location plot of the previously tested model 1 (lmod1) plot(lmod1) #assessing a logit model for binary data by plotting the predicted probability against observed proportions #download library(languageR) library(languageR) plot.logistic.fit.fnc(logres1,data_in) 20. Extracting residuals and writing them to a file—assuming lmod2 is the model of choice. res_lmod2 <-residuals(lmod2) write.table(res_lmod2,file¼"res_FWEC")