SlideShare a Scribd company logo
R Bootcamp Day 3 Part 1
Jefferson Davis
Olga Scrivner
Day 2 stuff
From yesterday and the day before
• R values have types/classes such as numeric, character,
logical, dataframes, and matrices.
• Much of R functionality is in libraries
• For help on a function run
? t.test()
from the R console.
• The plot() function will usually do something useful.
R: Common stats functions
Common statistical tests are very straightforward in R. Let's try
one on yesterday's dataset cars of car speeds and stopping
distances from the 1920s.
head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
R: Common stats functions
Here's a t-test that the mean of the speeds in cars is not 12.
t.test(cars$speed, mu=12)
One Sample t-test
data: cars$speed
t = 4.5468, df = 49, p-value = 3.588e-05
alternative hypothesis: true mean is not equal
to 12
95 percent confidence interval:
13.89727 16.90273
sample estimates:
mean of x
15.4
R: Common stats functions
We can change the parameters of t-test.
t.test(cars$speed, mu=12, alternative="less",
conf.level=.99)
One Sample t-test
data: cars$speed
t = 4.5468, df = 49, p-value = 1
alternative hypothesis: true mean is less than
12
99 percent confidence interval:
-Inf 17.19834
sample estimates:
mean of x
15.4
R: Common stats functions
Anything you would see in a year long stats sequence will have
an implentation in R.
chisq.test() #Chi-squared
prop.test() #Proportions test
binom.test() #Exact binomial test
ks.test() #Kolmogorov–Smirnov
sd() #Standard deviation
cor() #Correlation
R: Linear regression
Regression analysis is one of the most popular and important
tools in statistics. If R goofed here, it would be worthless.
R uses the function lm() for linear models. The regression
formula is given in Wilkinson-Rogers notation
Predictor terms Wilkinson Notation
Intercept 1 (Default)
No intercept -1
x1 x1
x1, x2 x1 + x2
x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2)
x1x2 x1:x2
x1
2, x1 x1^2
x1 + x2 I(x1 + x2) (The letter I)
R: Linear regression
Regression analysis is one of the most important tools in
statistics. R uses Wilkinson-Rogers notation to to specify linear
models. So a model such as
yi = β0 + β1 xi1 + εi
Shows up in the R syntax as
y ~ x1
Let's review this syntax.
(Tables from https://www.mathworks.com/help/stats/wilkinson-
notation.html)
R: Linear regression
Predictor terms Wilkinson Notation
Intercept 1 (Default)
No intercept -1
x1 x1
x1, x2 x1 + x2
x1, x2, x1x2 x1*x2
(or x1 + x2 + x1:x2)
x1x2 x1:x2
x1
2, x1 x1^2
x1 + x2 I(x1 + x2)
R: Linear regression
Model Wilkinson Notation
yi = β0 + β1 xi1 + β2 xi2 + εi
Two predictors
y ~ x1 + x2
yi = β1 xi1 + β2 xi2 + εi
Two predictors and no intercept
y ~ x1 + x2 - 1
yi = β0 + β1 xi1 + β2 xi2 +
β3 xi1 xi2 + εi
Two predictors with the interaction
term
y ~ x1 * x2
y ~ x1 + x2 + x1:x2
yi = β0 + β1 (xi1 + xi2 ) + εi
Regressing on the sum of predictors
y ~ I(x1 + x2)
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + εi
Three predictors with one interaction
y ~ x1 * x2 + x3
R: Linear regression
Model terms Wilkinson Notation
yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi
Two predictors, no intercept
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
β7 xi1 xi2xi3+ εi
Three predictors, all interaction terms
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
εi
Three predictors, all two-way
interaction terms.
R: Linear regression
Model terms Wilkinson Notation
yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi
Two predictors, no intercept
y ~ x1*x2 - 1
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
β7 xi1 xi2xi3+ εi
Three predictors, all interaction terms
y ~ x1 * x2 * x3
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
εi
Three predictors, all two-way
interaction terms
y ~ x1 * x2 * x3 – x1:x2:x3
R: Linear regression
• R uses the function lm() for linear models.
• Generic syntax
lm(DV ~ IV1, NAME_OF_DATAFRAME)
• The above tells R that to regress the dependent variable (DV)
onto independent variable IV1. We can include other
variables and interaction effects.
lm(DV ~ IV1 + IV2 + IV1*IV2,
NAME_OF_DATAFRAME)
R: Linear regression
• Let's do an example using the cars data set. How about
regressing stopping distance on speed.
lm(dist ~ speed, cars)
Call:lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept) speed
-17.579 3.932
• To work more let's store this in a variable
car.fit <- lm(dist ~ speed, cars)
R: Linear regression
summary(car.fit)
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
R: Linear regression
• We can also look at individual fields of the lm object.
car.fit$coefficients
(Intercept) speed
-17.579095 3.932409
car.fit$residuals[1:3]
1 2 3
3.849460 11.849460 -5.947766
car.fit$fitted.values[1:3]
1 2 3
-1.849460 -1.849460 9.947766
R: Linear regression
• Plot the fit
plot(cars$speed,
cars$dist,
xlab = "distance",
ylab = "speed")
abline(car.fit,
col="red")
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Mixed models
It doesn't seem crazy to fit a slope but use a random effect for
intercept.
fmOrthF <-
lme( distance ~ age,
data = OrthoFem,
random = ~ 1 | Subject )
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Mixed models
• Let's take a look at a mixed model. We need a more complex
dataset. We use a subset of the Orthodont data set from the
Nonlinear Mixed-Effects Models (nlme) library.
library(nlme)
head(Orthodont)
Grouped Data: distance ~ age | Subject
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
R: Mixed models
OrthoFem <-
Orthodont[Orthodont$Sex
== "Female", ]
plot(OrthoFem)
R: Mixed models
In fact, it isn't crazy.
summary(fmOrthF)
Linear mixed-effects model fit by REML
Data: OrthoFem
AIC BIC logLik
149.2183 156.169 -70.60916
Random effects: Formula: ~1 | Subject
(Intercept) Residual
StdDev: 2.06847 0.7800331
Fixed effects: distance ~ age
Value Std.Error DF t-value p-value
(Intercept) 17.372727 0.8587419 32 20.230440 0
age 0.479545 0.0525898 32 9.118598 0
Correlation: (Intr)age -0.674
R: Conditional trees
At this point, I tag Olga in.

More Related Content

PPTX
Law Enforcement Operations Operations w/ Crime Mapping
PPTX
LEA-2-COMPARATIVE-MODELS-IN-POLICING-NEW-CURRICULUM.pptx
PDF
EBIOS Risk Manager
PPTX
NON INSTI PPT.pptx
PDF
LEA 3 - A Review ppt copy 3tc criminology
PPT
Ppt chapter 1
PDF
Introduction to haskell
PPT
Polynomial.ppt grade 9, Mathematics Algebra
Law Enforcement Operations Operations w/ Crime Mapping
LEA-2-COMPARATIVE-MODELS-IN-POLICING-NEW-CURRICULUM.pptx
EBIOS Risk Manager
NON INSTI PPT.pptx
LEA 3 - A Review ppt copy 3tc criminology
Ppt chapter 1
Introduction to haskell
Polynomial.ppt grade 9, Mathematics Algebra

Similar to R Bootcamp Day 3 Part 1 - Statistics in R (20)

PPTX
R Language Introduction
PPTX
3.1 Characteristics of Polynomial Functions.pptx
PDF
Econometric Analysis 8th Edition Greene Solutions Manual
PPTX
Unit-1 Basic Concept of Algorithm.pptx
PPTX
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
PPT
Linear equations inequalities and applications
PDF
R programming intro with examples
PPTX
Matlab polynimials and curve fitting
PPTX
IGCSE_Functions.pptx fffffffffffffffffffffffffffffffffffffffffffffffffffffff...
PDF
Module 3 polynomial functions
PPTX
Reed solomon Encoder and Decoder
PPTX
Reed Solomon encoder and decoder \ ريد سلمون
PPTX
01 FUNCTIONS.pptx
PPTX
[Template] 3.0 Relations and Functions. Intro Unit 4 ppt Student.pptx
PPTX
PPT
R Programming Intro
PDF
PPTX
matlab presentation fro engninering students
PDF
me310_5_regression.pdf numerical method for engineering
PPT
TABREZ KHAN.ppt
R Language Introduction
3.1 Characteristics of Polynomial Functions.pptx
Econometric Analysis 8th Edition Greene Solutions Manual
Unit-1 Basic Concept of Algorithm.pptx
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
Linear equations inequalities and applications
R programming intro with examples
Matlab polynimials and curve fitting
IGCSE_Functions.pptx fffffffffffffffffffffffffffffffffffffffffffffffffffffff...
Module 3 polynomial functions
Reed solomon Encoder and Decoder
Reed Solomon encoder and decoder \ ريد سلمون
01 FUNCTIONS.pptx
[Template] 3.0 Relations and Functions. Intro Unit 4 ppt Student.pptx
R Programming Intro
matlab presentation fro engninering students
me310_5_regression.pdf numerical method for engineering
TABREZ KHAN.ppt
Ad

More from Olga Scrivner (20)

PPTX
Engaging Students Competition and Polls.pptx
PPTX
HICSS ATLT: Advances in Teaching and Learning Technologies
PDF
The power of unstructured data: Recommendation systems
PPTX
Cognitive executive functions and Opioid Use Disorder
PDF
Introduction to Web Scraping with Python
PDF
Call for paper Collaboration Systems and Technology
PDF
Jupyter machine learning crash course
PDF
R and RMarkdown crash course
PDF
The Impact of Language Requirement on Students' Performance, Retention, and M...
PPTX
If a picture is worth a thousand words, Interactive data visualizations are w...
PPTX
Introduction to Interactive Shiny Web Application
PDF
Introduction to Overleaf Workshop
PDF
R crash course for Business Analytics Course K303
PDF
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
PDF
Gender Disparity in Employment and Education
PDF
CrashCourse: Python with DataCamp and Jupyter for Beginners
PDF
Optimizing Data Analysis: Web application with Shiny
PDF
Data Analysis and Visualization: R Workflow
PDF
Reproducible visual analytics of public opioid data
PPTX
Building Effective Visualization Shiny WVF
Engaging Students Competition and Polls.pptx
HICSS ATLT: Advances in Teaching and Learning Technologies
The power of unstructured data: Recommendation systems
Cognitive executive functions and Opioid Use Disorder
Introduction to Web Scraping with Python
Call for paper Collaboration Systems and Technology
Jupyter machine learning crash course
R and RMarkdown crash course
The Impact of Language Requirement on Students' Performance, Retention, and M...
If a picture is worth a thousand words, Interactive data visualizations are w...
Introduction to Interactive Shiny Web Application
Introduction to Overleaf Workshop
R crash course for Business Analytics Course K303
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Gender Disparity in Employment and Education
CrashCourse: Python with DataCamp and Jupyter for Beginners
Optimizing Data Analysis: Web application with Shiny
Data Analysis and Visualization: R Workflow
Reproducible visual analytics of public opioid data
Building Effective Visualization Shiny WVF
Ad

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Computer network topology notes for revision
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
climate analysis of Dhaka ,Banglades.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Acumen Training GuidePresentation.pptx
Computer network topology notes for revision
IB Computer Science - Internal Assessment.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
annual-report-2024-2025 original latest.
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

R Bootcamp Day 3 Part 1 - Statistics in R

  • 1. R Bootcamp Day 3 Part 1 Jefferson Davis Olga Scrivner
  • 2. Day 2 stuff From yesterday and the day before • R values have types/classes such as numeric, character, logical, dataframes, and matrices. • Much of R functionality is in libraries • For help on a function run ? t.test() from the R console. • The plot() function will usually do something useful.
  • 3. R: Common stats functions Common statistical tests are very straightforward in R. Let's try one on yesterday's dataset cars of car speeds and stopping distances from the 1920s. head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10
  • 4. R: Common stats functions Here's a t-test that the mean of the speeds in cars is not 12. t.test(cars$speed, mu=12) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 3.588e-05 alternative hypothesis: true mean is not equal to 12 95 percent confidence interval: 13.89727 16.90273 sample estimates: mean of x 15.4
  • 5. R: Common stats functions We can change the parameters of t-test. t.test(cars$speed, mu=12, alternative="less", conf.level=.99) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 1 alternative hypothesis: true mean is less than 12 99 percent confidence interval: -Inf 17.19834 sample estimates: mean of x 15.4
  • 6. R: Common stats functions Anything you would see in a year long stats sequence will have an implentation in R. chisq.test() #Chi-squared prop.test() #Proportions test binom.test() #Exact binomial test ks.test() #Kolmogorov–Smirnov sd() #Standard deviation cor() #Correlation
  • 7. R: Linear regression Regression analysis is one of the most popular and important tools in statistics. If R goofed here, it would be worthless. R uses the function lm() for linear models. The regression formula is given in Wilkinson-Rogers notation Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x1 2, x1 x1^2 x1 + x2 I(x1 + x2) (The letter I)
  • 8. R: Linear regression Regression analysis is one of the most important tools in statistics. R uses Wilkinson-Rogers notation to to specify linear models. So a model such as yi = β0 + β1 xi1 + εi Shows up in the R syntax as y ~ x1 Let's review this syntax. (Tables from https://www.mathworks.com/help/stats/wilkinson- notation.html)
  • 9. R: Linear regression Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x1 2, x1 x1^2 x1 + x2 I(x1 + x2)
  • 10. R: Linear regression Model Wilkinson Notation yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors y ~ x1 + x2 yi = β1 xi1 + β2 xi2 + εi Two predictors and no intercept y ~ x1 + x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors with the interaction term y ~ x1 * x2 y ~ x1 + x2 + x1:x2 yi = β0 + β1 (xi1 + xi2 ) + εi Regressing on the sum of predictors y ~ I(x1 + x2) yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + εi Three predictors with one interaction y ~ x1 * x2 + x3
  • 11. R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms.
  • 12. R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept y ~ x1*x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms y ~ x1 * x2 * x3 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms y ~ x1 * x2 * x3 – x1:x2:x3
  • 13. R: Linear regression • R uses the function lm() for linear models. • Generic syntax lm(DV ~ IV1, NAME_OF_DATAFRAME) • The above tells R that to regress the dependent variable (DV) onto independent variable IV1. We can include other variables and interaction effects. lm(DV ~ IV1 + IV2 + IV1*IV2, NAME_OF_DATAFRAME)
  • 14. R: Linear regression • Let's do an example using the cars data set. How about regressing stopping distance on speed. lm(dist ~ speed, cars) Call:lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932 • To work more let's store this in a variable car.fit <- lm(dist ~ speed, cars)
  • 15. R: Linear regression summary(car.fit) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 ***
  • 16. R: Linear regression • We can also look at individual fields of the lm object. car.fit$coefficients (Intercept) speed -17.579095 3.932409 car.fit$residuals[1:3] 1 2 3 3.849460 11.849460 -5.947766 car.fit$fitted.values[1:3] 1 2 3 -1.849460 -1.849460 9.947766
  • 17. R: Linear regression • Plot the fit plot(cars$speed, cars$dist, xlab = "distance", ylab = "speed") abline(car.fit, col="red")
  • 18. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 19. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 20. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 21. R: Mixed models It doesn't seem crazy to fit a slope but use a random effect for intercept. fmOrthF <- lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )
  • 22. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 23. R: Mixed models • Let's take a look at a mixed model. We need a more complex dataset. We use a subset of the Orthodont data set from the Nonlinear Mixed-Effects Models (nlme) library. library(nlme) head(Orthodont) Grouped Data: distance ~ age | Subject distance age Subject Sex 1 26.0 8 M01 Male 2 25.0 10 M01 Male 3 29.0 12 M01 Male 4 31.0 14 M01 Male
  • 24. R: Mixed models OrthoFem <- Orthodont[Orthodont$Sex == "Female", ] plot(OrthoFem)
  • 25. R: Mixed models In fact, it isn't crazy. summary(fmOrthF) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik 149.2183 156.169 -70.60916 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 2.06847 0.7800331 Fixed effects: distance ~ age Value Std.Error DF t-value p-value (Intercept) 17.372727 0.8587419 32 20.230440 0 age 0.479545 0.0525898 32 9.118598 0 Correlation: (Intr)age -0.674
  • 26. R: Conditional trees At this point, I tag Olga in.