SlideShare a Scribd company logo
Lab 14 – Model Selection and
Multimodel Inference
November 26 & 27, 2018
FANR 6750
Richard Chandler and Bob Cooper
Today’s Topics
1 Model Fitting
2 Model Selection
3 Multi-model Inference
Today’s Topics
1 Model Fitting
2 Model Selection
3 Multi-model Inference
Swiss Data
swissData <- read.csv("swissData.csv")
head(swissData, n=11)
## elevation forest water sppRichness
## 1 450 3 No 35
## 2 450 21 No 51
## 3 1050 32 No 46
## 4 950 9 Yes 31
## 5 1150 35 Yes 50
## 6 550 2 No 43
## 7 750 6 No 37
## 8 650 60 Yes 47
## 9 550 5 Yes 37
## 10 550 13 No 43
## 11 1150 50 No 52
Model Fitting Model Selection Multi-model Inference 3 / 15
Four linear models
fm1 <- lm(sppRichness ~ forest, data=swissData)
fm2 <- lm(sppRichness ~ elevation, data=swissData)
fm3 <- lm(sppRichness ~ forest + elevation +
water, data=swissData)
fm4 <- lm(sppRichness ~ forest + elevation +
I(elevation^2) + water, data=swissData)
Model Fitting Model Selection Multi-model Inference 4 / 15
Model 4 – Estimates
summary(fm4)
##
## Call:
## lm(formula = sppRichness ~ forest + elevation + I(elevation^2) +
## water, data = swissData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.314 -3.205 -0.377 3.334 15.082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.518e+01 1.286e+00 35.137 < 2e-16 ***
## forest 2.311e-01 1.276e-02 18.111 < 2e-16 ***
## elevation -1.016e-02 2.572e-03 -3.951 0.0001 ***
## I(elevation^2) 6.103e-08 9.661e-07 0.063 0.9497
## waterYes -3.013e+00 6.821e-01 -4.418 1.46e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.954 on 262 degrees of freedom
## Multiple R-squared: 0.7929,Adjusted R-squared: 0.7897
## F-statistic: 250.8 on 4 and 262 DF, p-value: < 2.2e-16
Model Fitting Model Selection Multi-model Inference 5 / 15
Model 4 – ANOVA table
summary.aov(fm4)
## Df Sum Sq Mean Sq F value Pr(>F)
## forest 1 13311 13311 542.40 < 2e-16 ***
## elevation 1 10820 10820 440.89 < 2e-16 ***
## I(elevation^2) 1 7 7 0.27 0.604
## water 1 479 479 19.52 1.46e-05 ***
## Residuals 262 6430 25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We could compute AIC using the equation AIC = n log(RSS/n) + 2K,
where RSS is the residual sum-of-squares.
Model Fitting Model Selection Multi-model Inference 6 / 15
Model 4 – ANOVA table
summary.aov(fm4)
## Df Sum Sq Mean Sq F value Pr(>F)
## forest 1 13311 13311 542.40 < 2e-16 ***
## elevation 1 10820 10820 440.89 < 2e-16 ***
## I(elevation^2) 1 7 7 0.27 0.604
## water 1 479 479 19.52 1.46e-05 ***
## Residuals 262 6430 25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We could compute AIC using the equation AIC = n log(RSS/n) + 2K,
where RSS is the residual sum-of-squares.
However, we will use the more general formula: AIC = −2L(ˆθ; y) + 2K.
Model Fitting Model Selection Multi-model Inference 6 / 15
Outline
1 Model Fitting
2 Model Selection
3 Multi-model Inference
Compute AIC for each model
Sample size
n <- nrow(swissData)
Model Fitting Model Selection Multi-model Inference 8 / 15
Compute AIC for each model
Sample size
n <- nrow(swissData)
log-likelihood for each model
logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4))
Model Fitting Model Selection Multi-model Inference 8 / 15
Compute AIC for each model
Sample size
n <- nrow(swissData)
log-likelihood for each model
logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4))
Number of parameters
K <- c(3, 3, 5, 6)
Model Fitting Model Selection Multi-model Inference 8 / 15
Compute AIC for each model
Sample size
n <- nrow(swissData)
log-likelihood for each model
logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4))
Number of parameters
K <- c(3, 3, 5, 6)
AIC
AIC <- -2*logL + 2*K
Model Fitting Model Selection Multi-model Inference 8 / 15
Compute AIC for each model
Sample size
n <- nrow(swissData)
log-likelihood for each model
logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4))
Number of parameters
K <- c(3, 3, 5, 6)
AIC
AIC <- -2*logL + 2*K
∆AIC
delta <- AIC - min(AIC)
Model Fitting Model Selection Multi-model Inference 8 / 15
Compute AIC for each model
Sample size
n <- nrow(swissData)
log-likelihood for each model
logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4))
Number of parameters
K <- c(3, 3, 5, 6)
AIC
AIC <- -2*logL + 2*K
∆AIC
delta <- AIC - min(AIC)
AIC Weights
w <- exp(-0.5*delta)/sum(exp(-0.5*delta))
Model Fitting Model Selection Multi-model Inference 8 / 15
AIC table
Put vectors in data.frame
ms <- data.frame(logL, K, AIC, delta, w)
rownames(ms) <- c("fm1", "fm2", "fm3", "fm4")
round(ms, digits=2)
## logL K AIC delta w
## fm1 -939.03 3 1884.06 266.90 0.00
## fm2 -934.07 3 1874.15 256.99 0.00
## fm3 -803.58 5 1617.16 0.00 0.73
## fm4 -803.58 6 1619.15 2.00 0.27
Model Fitting Model Selection Multi-model Inference 9 / 15
AIC table
Put vectors in data.frame
ms <- data.frame(logL, K, AIC, delta, w)
rownames(ms) <- c("fm1", "fm2", "fm3", "fm4")
round(ms, digits=2)
## logL K AIC delta w
## fm1 -939.03 3 1884.06 266.90 0.00
## fm2 -934.07 3 1874.15 256.99 0.00
## fm3 -803.58 5 1617.16 0.00 0.73
## fm4 -803.58 6 1619.15 2.00 0.27
Sort data.frame based on AIC values
ms <- ms[order(ms$AIC),]
round(ms, digits=2)
## logL K AIC delta w
## fm3 -803.58 5 1617.16 0.00 0.73
## fm4 -803.58 6 1619.15 2.00 0.27
## fm2 -934.07 3 1874.15 256.99 0.00
## fm1 -939.03 3 1884.06 266.90 0.00
Model Fitting Model Selection Multi-model Inference 9 / 15
Similar process using R’s AIC function
AIC(fm1, fm2, fm3, fm4)
## df AIC
## fm1 3 1884.057
## fm2 3 1874.146
## fm3 5 1617.157
## fm4 6 1619.153
Model Fitting Model Selection Multi-model Inference 10 / 15
Similar process using R’s AIC function
AIC(fm1, fm2, fm3, fm4)
## df AIC
## fm1 3 1884.057
## fm2 3 1874.146
## fm3 5 1617.157
## fm4 6 1619.153
Notes
• If we had used the residual sums-of-squares instead of the
log-likelihoods, the AIC values would have been different, but
the ∆AIC values would have been the same
Model Fitting Model Selection Multi-model Inference 10 / 15
Similar process using R’s AIC function
AIC(fm1, fm2, fm3, fm4)
## df AIC
## fm1 3 1884.057
## fm2 3 1874.146
## fm3 5 1617.157
## fm4 6 1619.153
Notes
• If we had used the residual sums-of-squares instead of the
log-likelihoods, the AIC values would have been different, but
the ∆AIC values would have been the same
• Either approach is fine with linear models, but log-likelihoods
must be used with GLMs and other models fit using maximum
likelihood
Model Fitting Model Selection Multi-model Inference 10 / 15
Outline
1 Model Fitting
2 Model Selection
3 Multi-model Inference
Model-specific predictions
Expected number of species at 1000m elevation, 25% forest cover, and no
water, for each model
predData1 <- data.frame(elevation=1000, forest=25, water="No")
Model Fitting Model Selection Multi-model Inference 12 / 15
Model-specific predictions
Expected number of species at 1000m elevation, 25% forest cover, and no
water, for each model
predData1 <- data.frame(elevation=1000, forest=25, water="No")
E1 <- predict(fm1, newdata=predData1, type="response")
as.numeric(E1) # remove names (optional)
## [1] 37.90222
Model Fitting Model Selection Multi-model Inference 12 / 15
Model-specific predictions
Expected number of species at 1000m elevation, 25% forest cover, and no
water, for each model
predData1 <- data.frame(elevation=1000, forest=25, water="No")
E1 <- predict(fm1, newdata=predData1, type="response")
as.numeric(E1) # remove names (optional)
## [1] 37.90222
E2 <- predict(fm2, newdata=predData1, type="response")
as.numeric(E2)
## [1] 42.53368
Model Fitting Model Selection Multi-model Inference 12 / 15
Model-specific predictions
Expected number of species at 1000m elevation, 25% forest cover, and no
water, for each model
predData1 <- data.frame(elevation=1000, forest=25, water="No")
E1 <- predict(fm1, newdata=predData1, type="response")
as.numeric(E1) # remove names (optional)
## [1] 37.90222
E2 <- predict(fm2, newdata=predData1, type="response")
as.numeric(E2)
## [1] 42.53368
E3 <- predict(fm3, newdata=predData1, type="response")
as.numeric(E3)
## [1] 40.88604
Model Fitting Model Selection Multi-model Inference 12 / 15
Model-specific predictions
Expected number of species at 1000m elevation, 25% forest cover, and no
water, for each model
predData1 <- data.frame(elevation=1000, forest=25, water="No")
E1 <- predict(fm1, newdata=predData1, type="response")
as.numeric(E1) # remove names (optional)
## [1] 37.90222
E2 <- predict(fm2, newdata=predData1, type="response")
as.numeric(E2)
## [1] 42.53368
E3 <- predict(fm3, newdata=predData1, type="response")
as.numeric(E3)
## [1] 40.88604
E4 <- predict(fm4, newdata=predData1, type="response")
as.numeric(E4)
## [1] 40.86092
Model Fitting Model Selection Multi-model Inference 12 / 15
Model-averaged prediction
Expected number of species at 1000m, 25% forest cover, and no
water, averaged over all 4 models
Model Fitting Model Selection Multi-model Inference 13 / 15
Model-averaged prediction
Expected number of species at 1000m, 25% forest cover, and no
water, averaged over all 4 models
E1*w[1] + E2*w[2] + E3*w[3] + E4*w[4]
## 1
## 40.87927
Model Fitting Model Selection Multi-model Inference 13 / 15
Model-averaged regression lines
Predict species richness over range of forest cover, for each model
predData2 <- data.frame(forest=seq(0, 100, length=50),
elevation=1000, water="No")
E1 <- predict(fm1, newdata=predData2)
E2 <- predict(fm2, newdata=predData2)
E3 <- predict(fm3, newdata=predData2)
E4 <- predict(fm4, newdata=predData2)
Emat <- cbind(E1, E2, E3, E4)
Model Fitting Model Selection Multi-model Inference 14 / 15
Model-averaged regression lines
Predict species richness over range of forest cover, for each model
predData2 <- data.frame(forest=seq(0, 100, length=50),
elevation=1000, water="No")
E1 <- predict(fm1, newdata=predData2)
E2 <- predict(fm2, newdata=predData2)
E3 <- predict(fm3, newdata=predData2)
E4 <- predict(fm4, newdata=predData2)
Emat <- cbind(E1, E2, E3, E4)
How do we model-average these vectors?
Model Fitting Model Selection Multi-model Inference 14 / 15
Model-averaged regression lines
Predict species richness over range of forest cover, for each model
predData2 <- data.frame(forest=seq(0, 100, length=50),
elevation=1000, water="No")
E1 <- predict(fm1, newdata=predData2)
E2 <- predict(fm2, newdata=predData2)
E3 <- predict(fm3, newdata=predData2)
E4 <- predict(fm4, newdata=predData2)
Emat <- cbind(E1, E2, E3, E4)
How do we model-average these vectors?
Evec <- Emat %*% w
Model Fitting Model Selection Multi-model Inference 14 / 15
Model-averaged regression line
plot(sppRichness~forest, data=swissData, xlab="Forest cover", ylab="Species richness", cex.lab=1.5)
lines(E1 ~ forest, predData2, col="lightgreen", lwd=4)
lines(E2 ~ forest, predData2, col="orange", lwd=3)
lines(E3 ~ forest, predData2, col="purple", lwd=2)
lines(E4 ~ forest, predData2, col="red", lwd=1)
lines(Evec ~ forest, predData2, col=rgb(0,0,1,0.2), lwd=10)
legend(60, 30, c("Model 1","Model 2","Model 3","Model 4","Model averaged"), lty=1, cex=1.2,
lwd=c(4,3,2,1,10), col=c("lightgreen", "orange", "purple", "red", rgb(0,0,1,0.2)))
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 20 40 60 80 100
102030405060
Forest cover
Speciesrichness
Model 1
Model 2
Model 3
Model 4
Model averaged
Model Fitting Model Selection Multi-model Inference 15 / 15

More Related Content

PDF
Nested Designs
PDF
Split-plot Designs
PDF
Repeated measures analysis in R
PDF
Factorial designs
PDF
ANCOVA in R
PDF
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
PDF
Flights Landing Overrun Project
PPT
Nested Designs
Split-plot Designs
Repeated measures analysis in R
Factorial designs
ANCOVA in R
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Flights Landing Overrun Project

What's hot (6)

PPT
PDF
State of the CFEngine 2018
DOCX
Data Analytics Project_Eun Seuk Choi (Eric)
DOCX
Trig
PPTX
Issta13 workshop on debugging
PPTX
Synapse india dotnet development overloading operater part 3
State of the CFEngine 2018
Data Analytics Project_Eun Seuk Choi (Eric)
Trig
Issta13 workshop on debugging
Synapse india dotnet development overloading operater part 3
Ad

Similar to Model Selection and Multi-model Inference (15)

PPTX
linear models.pptx
PDF
Linear models
 
PDF
11. Linear Models
 
DOCX
Chapter 16 Inference for RegressionClimate ChangeThe .docx
PDF
12. Linear models
PDF
Rsm notes f14
PPTX
Linear regression by Kodebay
PDF
Subject-3---Bayesian-regression-models-2024.pdf
PPTX
Undergraduate Modeling Workshop - Forest Cover Working Group Final Presentati...
PDF
Information Criteria And Statistical Modeling Sadanori Konishi
PDF
Multivariate Regression Analysis
PPT
An Introduction to Multilevel Regression Modeling for Prediction
PDF
Analysis of the Boston Housing Data from the 1970 census
PPTX
Lecture0701.pptx by statistics by Mohammed anwar
PPSX
Revisiting the fundamental concepts and assumptions of statistics pps
linear models.pptx
Linear models
 
11. Linear Models
 
Chapter 16 Inference for RegressionClimate ChangeThe .docx
12. Linear models
Rsm notes f14
Linear regression by Kodebay
Subject-3---Bayesian-regression-models-2024.pdf
Undergraduate Modeling Workshop - Forest Cover Working Group Final Presentati...
Information Criteria And Statistical Modeling Sadanori Konishi
Multivariate Regression Analysis
An Introduction to Multilevel Regression Modeling for Prediction
Analysis of the Boston Housing Data from the 1970 census
Lecture0701.pptx by statistics by Mohammed anwar
Revisiting the fundamental concepts and assumptions of statistics pps
Ad

More from richardchandler (12)

PDF
Introduction to Generalized Linear Models
PDF
Introduction to statistical modeling in R
PDF
Blocking lab
PDF
Assumptions of ANOVA
PDF
Lab on contrasts, estimation, and power
PDF
One-way ANOVA
PDF
t-tests in R - Lab slides for UGA course FANR 6750
PDF
Introduction to R - Lab slides for UGA course FANR 6750
PDF
Hierarchichal species distributions model and Maxent
PDF
Slides from ESA 2015
PDF
The role of spatial models in applied ecological research
PDF
2014 ISEC slides
Introduction to Generalized Linear Models
Introduction to statistical modeling in R
Blocking lab
Assumptions of ANOVA
Lab on contrasts, estimation, and power
One-way ANOVA
t-tests in R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750
Hierarchichal species distributions model and Maxent
Slides from ESA 2015
The role of spatial models in applied ecological research
2014 ISEC slides

Recently uploaded (20)

PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Sciences of Europe No 170 (2025)
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
BIOMOLECULES PPT........................
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
INTRODUCTION TO EVS | Concept of sustainability
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
7. General Toxicologyfor clinical phrmacy.pptx
Placing the Near-Earth Object Impact Probability in Context
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
. Radiology Case Scenariosssssssssssssss
2Systematics of Living Organisms t-.pptx
Biophysics 2.pdffffffffffffffffffffffffff
neck nodes and dissection types and lymph nodes levels
Introduction to Fisheries Biotechnology_Lesson 1.pptx
2. Earth - The Living Planet Module 2ELS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Cell Membrane: Structure, Composition & Functions
Sciences of Europe No 170 (2025)
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
BIOMOLECULES PPT........................
The KM-GBF monitoring framework – status & key messages.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
INTRODUCTION TO EVS | Concept of sustainability

Model Selection and Multi-model Inference

  • 1. Lab 14 – Model Selection and Multimodel Inference November 26 & 27, 2018 FANR 6750 Richard Chandler and Bob Cooper
  • 2. Today’s Topics 1 Model Fitting 2 Model Selection 3 Multi-model Inference
  • 3. Today’s Topics 1 Model Fitting 2 Model Selection 3 Multi-model Inference
  • 4. Swiss Data swissData <- read.csv("swissData.csv") head(swissData, n=11) ## elevation forest water sppRichness ## 1 450 3 No 35 ## 2 450 21 No 51 ## 3 1050 32 No 46 ## 4 950 9 Yes 31 ## 5 1150 35 Yes 50 ## 6 550 2 No 43 ## 7 750 6 No 37 ## 8 650 60 Yes 47 ## 9 550 5 Yes 37 ## 10 550 13 No 43 ## 11 1150 50 No 52 Model Fitting Model Selection Multi-model Inference 3 / 15
  • 5. Four linear models fm1 <- lm(sppRichness ~ forest, data=swissData) fm2 <- lm(sppRichness ~ elevation, data=swissData) fm3 <- lm(sppRichness ~ forest + elevation + water, data=swissData) fm4 <- lm(sppRichness ~ forest + elevation + I(elevation^2) + water, data=swissData) Model Fitting Model Selection Multi-model Inference 4 / 15
  • 6. Model 4 – Estimates summary(fm4) ## ## Call: ## lm(formula = sppRichness ~ forest + elevation + I(elevation^2) + ## water, data = swissData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.314 -3.205 -0.377 3.334 15.082 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.518e+01 1.286e+00 35.137 < 2e-16 *** ## forest 2.311e-01 1.276e-02 18.111 < 2e-16 *** ## elevation -1.016e-02 2.572e-03 -3.951 0.0001 *** ## I(elevation^2) 6.103e-08 9.661e-07 0.063 0.9497 ## waterYes -3.013e+00 6.821e-01 -4.418 1.46e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.954 on 262 degrees of freedom ## Multiple R-squared: 0.7929,Adjusted R-squared: 0.7897 ## F-statistic: 250.8 on 4 and 262 DF, p-value: < 2.2e-16 Model Fitting Model Selection Multi-model Inference 5 / 15
  • 7. Model 4 – ANOVA table summary.aov(fm4) ## Df Sum Sq Mean Sq F value Pr(>F) ## forest 1 13311 13311 542.40 < 2e-16 *** ## elevation 1 10820 10820 440.89 < 2e-16 *** ## I(elevation^2) 1 7 7 0.27 0.604 ## water 1 479 479 19.52 1.46e-05 *** ## Residuals 262 6430 25 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We could compute AIC using the equation AIC = n log(RSS/n) + 2K, where RSS is the residual sum-of-squares. Model Fitting Model Selection Multi-model Inference 6 / 15
  • 8. Model 4 – ANOVA table summary.aov(fm4) ## Df Sum Sq Mean Sq F value Pr(>F) ## forest 1 13311 13311 542.40 < 2e-16 *** ## elevation 1 10820 10820 440.89 < 2e-16 *** ## I(elevation^2) 1 7 7 0.27 0.604 ## water 1 479 479 19.52 1.46e-05 *** ## Residuals 262 6430 25 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We could compute AIC using the equation AIC = n log(RSS/n) + 2K, where RSS is the residual sum-of-squares. However, we will use the more general formula: AIC = −2L(ˆθ; y) + 2K. Model Fitting Model Selection Multi-model Inference 6 / 15
  • 9. Outline 1 Model Fitting 2 Model Selection 3 Multi-model Inference
  • 10. Compute AIC for each model Sample size n <- nrow(swissData) Model Fitting Model Selection Multi-model Inference 8 / 15
  • 11. Compute AIC for each model Sample size n <- nrow(swissData) log-likelihood for each model logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4)) Model Fitting Model Selection Multi-model Inference 8 / 15
  • 12. Compute AIC for each model Sample size n <- nrow(swissData) log-likelihood for each model logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4)) Number of parameters K <- c(3, 3, 5, 6) Model Fitting Model Selection Multi-model Inference 8 / 15
  • 13. Compute AIC for each model Sample size n <- nrow(swissData) log-likelihood for each model logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4)) Number of parameters K <- c(3, 3, 5, 6) AIC AIC <- -2*logL + 2*K Model Fitting Model Selection Multi-model Inference 8 / 15
  • 14. Compute AIC for each model Sample size n <- nrow(swissData) log-likelihood for each model logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4)) Number of parameters K <- c(3, 3, 5, 6) AIC AIC <- -2*logL + 2*K ∆AIC delta <- AIC - min(AIC) Model Fitting Model Selection Multi-model Inference 8 / 15
  • 15. Compute AIC for each model Sample size n <- nrow(swissData) log-likelihood for each model logL <- c(logLik(fm1), logLik(fm2), logLik(fm3), logLik(fm4)) Number of parameters K <- c(3, 3, 5, 6) AIC AIC <- -2*logL + 2*K ∆AIC delta <- AIC - min(AIC) AIC Weights w <- exp(-0.5*delta)/sum(exp(-0.5*delta)) Model Fitting Model Selection Multi-model Inference 8 / 15
  • 16. AIC table Put vectors in data.frame ms <- data.frame(logL, K, AIC, delta, w) rownames(ms) <- c("fm1", "fm2", "fm3", "fm4") round(ms, digits=2) ## logL K AIC delta w ## fm1 -939.03 3 1884.06 266.90 0.00 ## fm2 -934.07 3 1874.15 256.99 0.00 ## fm3 -803.58 5 1617.16 0.00 0.73 ## fm4 -803.58 6 1619.15 2.00 0.27 Model Fitting Model Selection Multi-model Inference 9 / 15
  • 17. AIC table Put vectors in data.frame ms <- data.frame(logL, K, AIC, delta, w) rownames(ms) <- c("fm1", "fm2", "fm3", "fm4") round(ms, digits=2) ## logL K AIC delta w ## fm1 -939.03 3 1884.06 266.90 0.00 ## fm2 -934.07 3 1874.15 256.99 0.00 ## fm3 -803.58 5 1617.16 0.00 0.73 ## fm4 -803.58 6 1619.15 2.00 0.27 Sort data.frame based on AIC values ms <- ms[order(ms$AIC),] round(ms, digits=2) ## logL K AIC delta w ## fm3 -803.58 5 1617.16 0.00 0.73 ## fm4 -803.58 6 1619.15 2.00 0.27 ## fm2 -934.07 3 1874.15 256.99 0.00 ## fm1 -939.03 3 1884.06 266.90 0.00 Model Fitting Model Selection Multi-model Inference 9 / 15
  • 18. Similar process using R’s AIC function AIC(fm1, fm2, fm3, fm4) ## df AIC ## fm1 3 1884.057 ## fm2 3 1874.146 ## fm3 5 1617.157 ## fm4 6 1619.153 Model Fitting Model Selection Multi-model Inference 10 / 15
  • 19. Similar process using R’s AIC function AIC(fm1, fm2, fm3, fm4) ## df AIC ## fm1 3 1884.057 ## fm2 3 1874.146 ## fm3 5 1617.157 ## fm4 6 1619.153 Notes • If we had used the residual sums-of-squares instead of the log-likelihoods, the AIC values would have been different, but the ∆AIC values would have been the same Model Fitting Model Selection Multi-model Inference 10 / 15
  • 20. Similar process using R’s AIC function AIC(fm1, fm2, fm3, fm4) ## df AIC ## fm1 3 1884.057 ## fm2 3 1874.146 ## fm3 5 1617.157 ## fm4 6 1619.153 Notes • If we had used the residual sums-of-squares instead of the log-likelihoods, the AIC values would have been different, but the ∆AIC values would have been the same • Either approach is fine with linear models, but log-likelihoods must be used with GLMs and other models fit using maximum likelihood Model Fitting Model Selection Multi-model Inference 10 / 15
  • 21. Outline 1 Model Fitting 2 Model Selection 3 Multi-model Inference
  • 22. Model-specific predictions Expected number of species at 1000m elevation, 25% forest cover, and no water, for each model predData1 <- data.frame(elevation=1000, forest=25, water="No") Model Fitting Model Selection Multi-model Inference 12 / 15
  • 23. Model-specific predictions Expected number of species at 1000m elevation, 25% forest cover, and no water, for each model predData1 <- data.frame(elevation=1000, forest=25, water="No") E1 <- predict(fm1, newdata=predData1, type="response") as.numeric(E1) # remove names (optional) ## [1] 37.90222 Model Fitting Model Selection Multi-model Inference 12 / 15
  • 24. Model-specific predictions Expected number of species at 1000m elevation, 25% forest cover, and no water, for each model predData1 <- data.frame(elevation=1000, forest=25, water="No") E1 <- predict(fm1, newdata=predData1, type="response") as.numeric(E1) # remove names (optional) ## [1] 37.90222 E2 <- predict(fm2, newdata=predData1, type="response") as.numeric(E2) ## [1] 42.53368 Model Fitting Model Selection Multi-model Inference 12 / 15
  • 25. Model-specific predictions Expected number of species at 1000m elevation, 25% forest cover, and no water, for each model predData1 <- data.frame(elevation=1000, forest=25, water="No") E1 <- predict(fm1, newdata=predData1, type="response") as.numeric(E1) # remove names (optional) ## [1] 37.90222 E2 <- predict(fm2, newdata=predData1, type="response") as.numeric(E2) ## [1] 42.53368 E3 <- predict(fm3, newdata=predData1, type="response") as.numeric(E3) ## [1] 40.88604 Model Fitting Model Selection Multi-model Inference 12 / 15
  • 26. Model-specific predictions Expected number of species at 1000m elevation, 25% forest cover, and no water, for each model predData1 <- data.frame(elevation=1000, forest=25, water="No") E1 <- predict(fm1, newdata=predData1, type="response") as.numeric(E1) # remove names (optional) ## [1] 37.90222 E2 <- predict(fm2, newdata=predData1, type="response") as.numeric(E2) ## [1] 42.53368 E3 <- predict(fm3, newdata=predData1, type="response") as.numeric(E3) ## [1] 40.88604 E4 <- predict(fm4, newdata=predData1, type="response") as.numeric(E4) ## [1] 40.86092 Model Fitting Model Selection Multi-model Inference 12 / 15
  • 27. Model-averaged prediction Expected number of species at 1000m, 25% forest cover, and no water, averaged over all 4 models Model Fitting Model Selection Multi-model Inference 13 / 15
  • 28. Model-averaged prediction Expected number of species at 1000m, 25% forest cover, and no water, averaged over all 4 models E1*w[1] + E2*w[2] + E3*w[3] + E4*w[4] ## 1 ## 40.87927 Model Fitting Model Selection Multi-model Inference 13 / 15
  • 29. Model-averaged regression lines Predict species richness over range of forest cover, for each model predData2 <- data.frame(forest=seq(0, 100, length=50), elevation=1000, water="No") E1 <- predict(fm1, newdata=predData2) E2 <- predict(fm2, newdata=predData2) E3 <- predict(fm3, newdata=predData2) E4 <- predict(fm4, newdata=predData2) Emat <- cbind(E1, E2, E3, E4) Model Fitting Model Selection Multi-model Inference 14 / 15
  • 30. Model-averaged regression lines Predict species richness over range of forest cover, for each model predData2 <- data.frame(forest=seq(0, 100, length=50), elevation=1000, water="No") E1 <- predict(fm1, newdata=predData2) E2 <- predict(fm2, newdata=predData2) E3 <- predict(fm3, newdata=predData2) E4 <- predict(fm4, newdata=predData2) Emat <- cbind(E1, E2, E3, E4) How do we model-average these vectors? Model Fitting Model Selection Multi-model Inference 14 / 15
  • 31. Model-averaged regression lines Predict species richness over range of forest cover, for each model predData2 <- data.frame(forest=seq(0, 100, length=50), elevation=1000, water="No") E1 <- predict(fm1, newdata=predData2) E2 <- predict(fm2, newdata=predData2) E3 <- predict(fm3, newdata=predData2) E4 <- predict(fm4, newdata=predData2) Emat <- cbind(E1, E2, E3, E4) How do we model-average these vectors? Evec <- Emat %*% w Model Fitting Model Selection Multi-model Inference 14 / 15
  • 32. Model-averaged regression line plot(sppRichness~forest, data=swissData, xlab="Forest cover", ylab="Species richness", cex.lab=1.5) lines(E1 ~ forest, predData2, col="lightgreen", lwd=4) lines(E2 ~ forest, predData2, col="orange", lwd=3) lines(E3 ~ forest, predData2, col="purple", lwd=2) lines(E4 ~ forest, predData2, col="red", lwd=1) lines(Evec ~ forest, predData2, col=rgb(0,0,1,0.2), lwd=10) legend(60, 30, c("Model 1","Model 2","Model 3","Model 4","Model averaged"), lty=1, cex=1.2, lwd=c(4,3,2,1,10), col=c("lightgreen", "orange", "purple", "red", rgb(0,0,1,0.2))) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 20 40 60 80 100 102030405060 Forest cover Speciesrichness Model 1 Model 2 Model 3 Model 4 Model averaged Model Fitting Model Selection Multi-model Inference 15 / 15