Generalized linear models, and extensions, in R

                                        Ben Bolker

           Departments of Mathematics & Statistics and Biology, McMaster University


                                     7 January 2011




Ben Bolker (McMaster University)           GLMs in R                      7 January 2011   1 / 25
1   Introduction


2   Example


3   Challenges, tricks, extensions


4   (Extended examples)




Ben Bolker (McMaster University)     GLMs in R   7 January 2011   2 / 25
What are generalized linear models?




      Modeling framework to solve two common statistical problems:
             Non-normal data
             Non-linearity (continuous predictors)
     . . . superset of, and often confused with,
     “general” linear models (i.e. ANOVA/ANCOVA/regression:
     SAS PROC GLM)




Ben Bolker (McMaster University)        GLMs in R         7 January 2011   3 / 25
GLMs: technical details


      Constraints:
             Distributions from exponential family
             (Normal, Poisson, binomial, Gamma, inverse Gaussian)
             Invertible nonlinearities, i.e. there exists a link function that would
             make the relationship linear
             (log, logit, probit, inverse, square root, “cauchit” . . . )
                                                                  ,
      Efficient, stable algorithm: iteratively re-weighted least squares (IRLS)
      / Fisher scoring)
      standard methods (methods(class="glm")):
      coef, summary, plot, predict, residuals, vcov, profile,
      update, confint, simulate, anova, add1/drop1, logLik, AIC, . . .
      logistic and Poisson regression probably make up 99% of GLMs . . .



Ben Bolker (McMaster University)         GLMs in R                     7 January 2011   4 / 25
Google scholar scraping



                       logistic+regression                                       q
                                                                            580000



                      Poisson+regression                      q
                                                          39300



             generalized+linear+model                     q
                                                      28700



                     binomial+regression        q
                                              13500


                                             104         104.5    105   105.5        106
                                             Ghits


Ben Bolker (McMaster University)             GLMs in R                          7 January 2011   5 / 25
Example: reed frog predation data


                   1.0


                                                                                 Vonesh and Bolker (2005):
                   0.8


                                 q
                                                                                 > library(emdbook)
     Fraction killed




                   0.6       q q                                                 > data(ReedfrogFuncresp)
                             q       q    q
                                                                                 > glm1 <- glm(Killed/Initial~
                                                     q          q
                   0.4   q
                                     q
                                          q
                                                                           q
                                                                                                  Initial,
                                                                q          q
                                                                                      weight=Initial,
                   0.2   q
                                                                                      family=binomial,
                                                     q
                                                                                      data=ReedfrogFuncresp)
                   0.0
                                     20        40        60         80    100
                                              Initial density




Ben Bolker (McMaster University)                                         GLMs in R                 7 January 2011   6 / 25
Summary
> summary(glm1)
Call:
glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,
    weights = Initial)

Deviance Residuals:
    Min       1Q   Median               3Q        Max
-4.4132 -0.7275    0.4347           1.0120     1.8172

Coefficients:
             Estimate Std. Error z           value Pr(>|z|)
(Intercept) -0.094563   0.188952             -0.50 0.61675
Initial     -0.008416   0.002697             -3.12 0.00181 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’            0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 47.518          on 15   degrees of freedom
Residual deviance: 37.717          on 14   degrees of freedom
AIC: 98.639

Number of Fisher Scoring iterations: 4
Ben Bolker (McMaster University)                 GLMs in R                 7 January 2011   7 / 25
Diagnostics
                                          Residuals vs Fitted                                                                                   Normal Q−Q




                                                                                                       20
                2




                                         q13                                5q
                                                                                                                                                                             13q
                                                                                                                                                                                    16q
                            q
                                                                            q
                                                                    q            q




                                                                                                              10
                                                       q                q
                                                                                 q                                                                                       q
                                                                                                                                                                 qq q
                                                                                                                                                                                                    diagnostics inherit



                                                                                         Std. deviance resid.
                0




                                                                                     q                                                                      qq
                                                                    q                                                                                   q




                                                                                                      0
      Residuals




                            q                                           q                                                                           q
                                                                                                                                          q qq
                                         q
                                                                                     q
                                                                                                                                      q                                                             from plot.lm


                                                                                             −10
     −2




                                                                                                                              q

                                                                                                                                                                                                    overdispersion:

                                                                                                       −20
                −4




                                                                                                                                                                                                    residual deviance
                                                                                                       −30
                                                       q11
                                                                                                                    q11

                                  −0.8           −0.6        −0.4
                                               Predicted values
                                                                            −0.2                                   −2             −1                 0
                                                                                                                                           Theoretical Quantiles
                                                                                                                                                                 1                        2
                                                                                                                                                                                                    ≈ χ2 n−p
                                               Scale−Location                                                                         Residuals vs Leverage
                                                     11q                                                                                                                                            (Venables and Ripley,
                                                                                                                                                q13
                                                                                                       2




                                                                                                                                                                                    16q
                                                                                                                                      q
                                                                                                                                                                                                    2002, p. 209):
                                                                                                                                                                                              1
                5




                                                                                                                                      q                                                       0.5
                                                                                                                                  q        q
                            q16                                                                                                           q
                                                                                                                                          q
                                         q13
                                                                                                                                                                                                    sum(residuals(glm1,
     Std. deviance resid.




                                                                                                                                  q
                       4




                                         q
                                                                                         Std. Pearson resid.
                                                                                                         0




                                                                                                                          q
                                                                                                                                           q
                                                                                                                                          q                                           q
                            q
                                                                                                                                                                                                    type="pearson")^2)
               3




                                                                                                                          q                                                                   0.5
                                                                            q
                                                                    q                                                                           q
                                                       q                    q                                                                                                                 1
                                                                                           −2




                                                                                                                                                                                                    =34.3:
      2




                                                                        q
                                                                    q   q        q q


                                                                                 q
                1




                                                                                     q
                                                                                                                                                                                                    p   0.05
                                                                                                       −4




                                                                                                                                          q11
                                                                                                                           Cook's distance
                0




                                  −0.8           −0.6        −0.4           −0.2                                   0.00   0.05        0.10      0.15 0.20         0.25       0.30   0.35
                                               Predicted values                                                                                  Leverage




Ben Bolker (McMaster University)                                                                                                            GLMs in R                                                        7 January 2011   8 / 25
Inference


      Coefficients: may be hard to communicate (reflect differences on the
      scale of linear predictor, e.g. logit/log-odds differences)
      Wald statistics: beware the Hauck-Donner effect
      (Venables and Ripley, 2002, p. 198). Wald CI of slope:
      stats:::confint.lm(glm1) (-0.0142,-0.0026)
      Likelihood ratio test, via anova:
      > anova(glm1,test="Chisq") ## OR
      > glm0 <- update(glm1, . ~ -Initial)
      > anova(glm1,glm0,test="Chisq")
      Likelihood profiles (via MASS::profile.glm),
      profile confidence intervals:
      MASS:::confint.glm(glm1) (-0.0137,-0.0031)



Ben Bolker (McMaster University)   GLMs in R                   7 January 2011   9 / 25
Estimation issues




      Convergence difficulties, especially with non-standard links: set
      starting values, center/scale variables (?)
      Complete separation: brglm, logistf, arm (bayesglm)
      Big data: biglm (bigglm)
      Many predictors (penalized regression):
      glmnet, glmpath, penalized (Machine learning task view)




Ben Bolker (McMaster University)   GLMs in R                7 January 2011   10 / 25
Tricks (within GLM framework)


      non-standard link functions:
             fitting hyperbolic models of predator attack rates (Michaelis-Menten)
             via binomial/inverse link
             (http://emdbolker.wikidot.com/voneshglm)
             exponential survivorship models via binomial/log link (Strong et al.,
             1999; Tiwari et al., 2006)
             Gaussian family with log link: fit exponential growth models with
             constant variance
      subtleties with Gamma GLMs and dispersion parameter:
      V&R MASS online complements,
      Paul Johnson’s notes
      offsets: variation in sampling area/intensity
      (e.g. strict proportionality)



Ben Bolker (McMaster University)       GLMs in R                   7 January 2011   11 / 25
Overdispersion

      Quasilikelihood models:
      > glmQ <- update(glm1,family="quasibinomial")
      > anova(glmQ,test="F")
       ˆ
      (φ = 2.45). No likelihood: qAIC requires some contortions
      extended GLMs
             negative binomial: MASS (glm.nb)
             beta-binomial:
                     aod (betabin)
                     gnlm (gnlr)
                     VGAM (vglm)
                     bbmle (mle2)
      GLMMs: lognormal-Poisson, logit-normal-binomial
      robust estimation (lmtest, sandwich):
      > coeftest(glm1,vcov=sandwich)
See also the vignette for the pscl package.
Ben Bolker (McMaster University)      GLMs in R            7 January 2011   12 / 25
Extensions




      Generalized additive models (Wood, 2006): mgcv, gamlss
      Zero-inflated/altered/hurdle models: pscl, VGAM
      Beta regression: betareg
      Generalized regression models: bbmle, VGAM, gnlm
      Random effects (generalized linear mixed models): lme4 and other
      packages (http://glmm.wikidot.com/faq)




Ben Bolker (McMaster University)   GLMs in R             7 January 2011   13 / 25
References



Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761.
Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series,
  326:283–293.
Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S.
  Springer, New York, 4th edition.
Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591.
Wood, S.N., 2006. Generalized Additive Models: An Introduction with R.
 Chapman & Hall/CRC.




Ben Bolker (McMaster University)   GLMs in R                7 January 2011   14 / 25
Basic ggplot code




> qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+
   geom_smooth(method=glm,family=binomial,
               aes(weight=Initial,group=NA))




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   15 / 25
Confidence intervals on # killed, by hand



> pframe <- data.frame(Initial=1:100)
> pp <- predict(glm1,newdata=pframe,se.fit=TRUE)
> pmat <- with(pp,plogis(cbind(fit,
                              fit-1.96*se.fit,
                              fit+1.96*se.fit)))
> par(bty="l",las=1)
> with(ReedfrogFuncresp,plot(Initial,Killed/Initial,
                            xlim=c(0,100),ylim=c(0,1),
                            pch=16))
> matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l")




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   16 / 25
Prediction intervals

                                                                                               > simhack <- function(params) {
                                                                                                  glmnew <- glm1
                                                                                                  glmnew$coefficients <- params
                                                                                                  ## simulates on PROBABILITY scale
                                                                                                  simulate(glmnew)[[1]]
                         1.0
                                                                                                }
                                                                                               > set.seed(101)
                         0.8                                                                   > params <- MASS::mvrnorm(1000,mu=coef(glm1),
                                                                                                                        Sigma=vcov(glm1))
                                           q
                                                                                               > sims <- apply(params,1,simhack)
        Killed/Initial




                         0.6           q   q
                                                                                               > qmat <- t(apply(sims,1,quantile,
                                       q       q    q
                                           q
                                               q
                                                                                                                c(0.5,0.025,0.975)))
                                                               q            q
                         0.4       q   q            q
                                                               q
                                                    q
                                               q
                                                                            q

                                                                            q
                                                                                      q

                                                                                      q
                                                                                      q
                                                                                               (Constructing the simulated
                         0.2       q


                                                               q
                                                                                               values at Initial densities from
                         0.0                                                                   1 to 100 is a bit more work —
                               0               20       40             60       80   100
                                                                                               ideally all simulate methods
                                                             Initial
                                                                                               would have newdata and
                                                                                               newparam arguments . . . )



Ben Bolker (McMaster University)                                                          GLMs in R                          7 January 2011    17 / 25
Alternative display (display, coefplot from arm
package)

                                        −0.015   −0.010        −0.005   0.000




                              Initial                     q




> display(glm1)
glm(formula = Killed/Initial ~ Initial, family = binomial, data = Re
    weights = Initial)
            coef.est coef.se
(Intercept) -0.09     0.19
Initial     -0.01     0.00
---
  n = 16, k = 2
  residual deviance = 37.7, null deviance = 47.5 (difference = 9.8)
Ben Bolker (McMaster University)                   GLMs in R                    7 January 2011   18 / 25
Beta-binomial with aod




> library(aod)
> glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial,
                       random=~1,
                       data=ReedfrogFuncresp)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   19 / 25
Beta-binomial with bbmle




> library(bbmle)
> glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp),
      theta=exp(logtheta),size=Initial),
      parameters=list(logitp~Initial),
      data=ReedfrogFuncresp,
      start=list(logitp=0,logtheta=0))




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   20 / 25
Beta-binomial with VGAM




> library(VGAM)
> glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial,
                betabinomial,
                data=ReedfrogFuncresp)
> coef(glmBB4,matrix=TRUE)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   21 / 25
Beta-binomial with gnlm



> library(gnlm)
> attach(ReedfrogFuncresp) ## no data= argument!
> glmBB2 <- gnlr(cbind(Killed,Initial-Killed),
      dist="beta binomial",
      pmu=c(0,0),pshape=0,
      mu=function(p,linear) plogis(linear),
      linear=~Initial)
> detach(ReedfrogFuncresp)
> detach("package:gnlm")
> detach("package:rmutil")




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   22 / 25
Logit-normal-Poisson with lme4




> library(lme4)
> ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp)
> glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID),
                 family=binomial,
                 data=ReedfrogFuncresp)
> summary(glmLNP)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   23 / 25
Alternate link functions for reed frog data


               1.0

               0.8
 Fraction killed




                           q
               0.6       q q
                         q     q    q
                     q                         q               q
               0.4                  q
                               q                                        q
                                                               q        q
               0.2   q
                                               q
               0.0
                               20        40           60           80   100
                                        Initial density


Ben Bolker (McMaster University)                   GLMs in R                7 January 2011   24 / 25
Comparing overdispersion estimates


                  LN−binomial                           q
                 beta−binomial                          q
                     sandwich                               q
         model




                 q−binom Wald                               q
             binomial profile                               q
                 binomial Wald                              q
                                       −0.015         −0.010    −0.005     0.000
                                   initial density effect


Ben Bolker (McMaster University)          GLMs in R                      7 January 2011   25 / 25

More Related Content

PPTX
Pte 60 band score tips
DOC
The Paper 16 Tenses English
DOCX
Academic ielts writing task 2 topic 07 - IELTSMaterial.com
PPTX
introduction to IELTS academic - Handout.pptx
PPTX
Maths study skills dfs-edc
PDF
open-source GLMM tools
PDF
Trondheim glmm
PDF
Ecological synthesis across scales: West Nile virus in individuals and commun...
Pte 60 band score tips
The Paper 16 Tenses English
Academic ielts writing task 2 topic 07 - IELTSMaterial.com
introduction to IELTS academic - Handout.pptx
Maths study skills dfs-edc
open-source GLMM tools
Trondheim glmm
Ecological synthesis across scales: West Nile virus in individuals and commun...

More from Ben Bolker (20)

PDF
evolution of virulence: devil in the details
PDF
model complexity and model choice for animal movement models
PDF
model complexity and model choice for animal movement models
PDF
Fundamental principles (?) of biological data
PDF
ESS of minimal mutation rate in an evo-epidemiological model
PPTX
math bio for 1st year math students
PDF
MBRS detectability talk
PDF
Waterloo GLMM talk
PDF
Waterloo GLMM talk
PDF
Bolker esa2014
PDF
Montpellier
PDF
virulence evolution (IGERT symposium)
PDF
Igert glmm
PDF
Davis eco-evo virulence
PDF
Google lme4
PDF
intro to knitr with RStudio
PDF
Stats sem 2013
PDF
computational science & engineering seminar, 16 oct 2013
PDF
Threads 2013
PDF
Threads 2013
evolution of virulence: devil in the details
model complexity and model choice for animal movement models
model complexity and model choice for animal movement models
Fundamental principles (?) of biological data
ESS of minimal mutation rate in an evo-epidemiological model
math bio for 1st year math students
MBRS detectability talk
Waterloo GLMM talk
Waterloo GLMM talk
Bolker esa2014
Montpellier
virulence evolution (IGERT symposium)
Igert glmm
Davis eco-evo virulence
Google lme4
intro to knitr with RStudio
Stats sem 2013
computational science & engineering seminar, 16 oct 2013
Threads 2013
Threads 2013
Ad

GLMs and extensions in R

  • 1. Generalized linear models, and extensions, in R Ben Bolker Departments of Mathematics & Statistics and Biology, McMaster University 7 January 2011 Ben Bolker (McMaster University) GLMs in R 7 January 2011 1 / 25
  • 2. 1 Introduction 2 Example 3 Challenges, tricks, extensions 4 (Extended examples) Ben Bolker (McMaster University) GLMs in R 7 January 2011 2 / 25
  • 3. What are generalized linear models? Modeling framework to solve two common statistical problems: Non-normal data Non-linearity (continuous predictors) . . . superset of, and often confused with, “general” linear models (i.e. ANOVA/ANCOVA/regression: SAS PROC GLM) Ben Bolker (McMaster University) GLMs in R 7 January 2011 3 / 25
  • 4. GLMs: technical details Constraints: Distributions from exponential family (Normal, Poisson, binomial, Gamma, inverse Gaussian) Invertible nonlinearities, i.e. there exists a link function that would make the relationship linear (log, logit, probit, inverse, square root, “cauchit” . . . ) , Efficient, stable algorithm: iteratively re-weighted least squares (IRLS) / Fisher scoring) standard methods (methods(class="glm")): coef, summary, plot, predict, residuals, vcov, profile, update, confint, simulate, anova, add1/drop1, logLik, AIC, . . . logistic and Poisson regression probably make up 99% of GLMs . . . Ben Bolker (McMaster University) GLMs in R 7 January 2011 4 / 25
  • 5. Google scholar scraping logistic+regression q 580000 Poisson+regression q 39300 generalized+linear+model q 28700 binomial+regression q 13500 104 104.5 105 105.5 106 Ghits Ben Bolker (McMaster University) GLMs in R 7 January 2011 5 / 25
  • 6. Example: reed frog predation data 1.0 Vonesh and Bolker (2005): 0.8 q > library(emdbook) Fraction killed 0.6 q q > data(ReedfrogFuncresp) q q q > glm1 <- glm(Killed/Initial~ q q 0.4 q q q q Initial, q q weight=Initial, 0.2 q family=binomial, q data=ReedfrogFuncresp) 0.0 20 40 60 80 100 Initial density Ben Bolker (McMaster University) GLMs in R 7 January 2011 6 / 25
  • 7. Summary > summary(glm1) Call: glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp, weights = Initial) Deviance Residuals: Min 1Q Median 3Q Max -4.4132 -0.7275 0.4347 1.0120 1.8172 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.094563 0.188952 -0.50 0.61675 Initial -0.008416 0.002697 -3.12 0.00181 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 47.518 on 15 degrees of freedom Residual deviance: 37.717 on 14 degrees of freedom AIC: 98.639 Number of Fisher Scoring iterations: 4 Ben Bolker (McMaster University) GLMs in R 7 January 2011 7 / 25
  • 8. Diagnostics Residuals vs Fitted Normal Q−Q 20 2 q13 5q 13q 16q q q q q 10 q q q q qq q diagnostics inherit Std. deviance resid. 0 q qq q q 0 Residuals q q q q qq q q q from plot.lm −10 −2 q overdispersion: −20 −4 residual deviance −30 q11 q11 −0.8 −0.6 −0.4 Predicted values −0.2 −2 −1 0 Theoretical Quantiles 1 2 ≈ χ2 n−p Scale−Location Residuals vs Leverage 11q (Venables and Ripley, q13 2 16q q 2002, p. 209): 1 5 q 0.5 q q q16 q q q13 sum(residuals(glm1, Std. deviance resid. q 4 q Std. Pearson resid. 0 q q q q q type="pearson")^2) 3 q 0.5 q q q q q 1 −2 =34.3: 2 q q q q q q 1 q p 0.05 −4 q11 Cook's distance 0 −0.8 −0.6 −0.4 −0.2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Predicted values Leverage Ben Bolker (McMaster University) GLMs in R 7 January 2011 8 / 25
  • 9. Inference Coefficients: may be hard to communicate (reflect differences on the scale of linear predictor, e.g. logit/log-odds differences) Wald statistics: beware the Hauck-Donner effect (Venables and Ripley, 2002, p. 198). Wald CI of slope: stats:::confint.lm(glm1) (-0.0142,-0.0026) Likelihood ratio test, via anova: > anova(glm1,test="Chisq") ## OR > glm0 <- update(glm1, . ~ -Initial) > anova(glm1,glm0,test="Chisq") Likelihood profiles (via MASS::profile.glm), profile confidence intervals: MASS:::confint.glm(glm1) (-0.0137,-0.0031) Ben Bolker (McMaster University) GLMs in R 7 January 2011 9 / 25
  • 10. Estimation issues Convergence difficulties, especially with non-standard links: set starting values, center/scale variables (?) Complete separation: brglm, logistf, arm (bayesglm) Big data: biglm (bigglm) Many predictors (penalized regression): glmnet, glmpath, penalized (Machine learning task view) Ben Bolker (McMaster University) GLMs in R 7 January 2011 10 / 25
  • 11. Tricks (within GLM framework) non-standard link functions: fitting hyperbolic models of predator attack rates (Michaelis-Menten) via binomial/inverse link (http://emdbolker.wikidot.com/voneshglm) exponential survivorship models via binomial/log link (Strong et al., 1999; Tiwari et al., 2006) Gaussian family with log link: fit exponential growth models with constant variance subtleties with Gamma GLMs and dispersion parameter: V&R MASS online complements, Paul Johnson’s notes offsets: variation in sampling area/intensity (e.g. strict proportionality) Ben Bolker (McMaster University) GLMs in R 7 January 2011 11 / 25
  • 12. Overdispersion Quasilikelihood models: > glmQ <- update(glm1,family="quasibinomial") > anova(glmQ,test="F") ˆ (φ = 2.45). No likelihood: qAIC requires some contortions extended GLMs negative binomial: MASS (glm.nb) beta-binomial: aod (betabin) gnlm (gnlr) VGAM (vglm) bbmle (mle2) GLMMs: lognormal-Poisson, logit-normal-binomial robust estimation (lmtest, sandwich): > coeftest(glm1,vcov=sandwich) See also the vignette for the pscl package. Ben Bolker (McMaster University) GLMs in R 7 January 2011 12 / 25
  • 13. Extensions Generalized additive models (Wood, 2006): mgcv, gamlss Zero-inflated/altered/hurdle models: pscl, VGAM Beta regression: betareg Generalized regression models: bbmle, VGAM, gnlm Random effects (generalized linear mixed models): lme4 and other packages (http://glmm.wikidot.com/faq) Ben Bolker (McMaster University) GLMs in R 7 January 2011 13 / 25
  • 14. References Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761. Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series, 326:283–293. Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S. Springer, New York, 4th edition. Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591. Wood, S.N., 2006. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. Ben Bolker (McMaster University) GLMs in R 7 January 2011 14 / 25
  • 15. Basic ggplot code > qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+ geom_smooth(method=glm,family=binomial, aes(weight=Initial,group=NA)) Ben Bolker (McMaster University) GLMs in R 7 January 2011 15 / 25
  • 16. Confidence intervals on # killed, by hand > pframe <- data.frame(Initial=1:100) > pp <- predict(glm1,newdata=pframe,se.fit=TRUE) > pmat <- with(pp,plogis(cbind(fit, fit-1.96*se.fit, fit+1.96*se.fit))) > par(bty="l",las=1) > with(ReedfrogFuncresp,plot(Initial,Killed/Initial, xlim=c(0,100),ylim=c(0,1), pch=16)) > matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l") Ben Bolker (McMaster University) GLMs in R 7 January 2011 16 / 25
  • 17. Prediction intervals > simhack <- function(params) { glmnew <- glm1 glmnew$coefficients <- params ## simulates on PROBABILITY scale simulate(glmnew)[[1]] 1.0 } > set.seed(101) 0.8 > params <- MASS::mvrnorm(1000,mu=coef(glm1), Sigma=vcov(glm1)) q > sims <- apply(params,1,simhack) Killed/Initial 0.6 q q > qmat <- t(apply(sims,1,quantile, q q q q q c(0.5,0.025,0.975))) q q 0.4 q q q q q q q q q q q (Constructing the simulated 0.2 q q values at Initial densities from 0.0 1 to 100 is a bit more work — 0 20 40 60 80 100 ideally all simulate methods Initial would have newdata and newparam arguments . . . ) Ben Bolker (McMaster University) GLMs in R 7 January 2011 17 / 25
  • 18. Alternative display (display, coefplot from arm package) −0.015 −0.010 −0.005 0.000 Initial q > display(glm1) glm(formula = Killed/Initial ~ Initial, family = binomial, data = Re weights = Initial) coef.est coef.se (Intercept) -0.09 0.19 Initial -0.01 0.00 --- n = 16, k = 2 residual deviance = 37.7, null deviance = 47.5 (difference = 9.8) Ben Bolker (McMaster University) GLMs in R 7 January 2011 18 / 25
  • 19. Beta-binomial with aod > library(aod) > glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial, random=~1, data=ReedfrogFuncresp) Ben Bolker (McMaster University) GLMs in R 7 January 2011 19 / 25
  • 20. Beta-binomial with bbmle > library(bbmle) > glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp), theta=exp(logtheta),size=Initial), parameters=list(logitp~Initial), data=ReedfrogFuncresp, start=list(logitp=0,logtheta=0)) Ben Bolker (McMaster University) GLMs in R 7 January 2011 20 / 25
  • 21. Beta-binomial with VGAM > library(VGAM) > glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial, betabinomial, data=ReedfrogFuncresp) > coef(glmBB4,matrix=TRUE) Ben Bolker (McMaster University) GLMs in R 7 January 2011 21 / 25
  • 22. Beta-binomial with gnlm > library(gnlm) > attach(ReedfrogFuncresp) ## no data= argument! > glmBB2 <- gnlr(cbind(Killed,Initial-Killed), dist="beta binomial", pmu=c(0,0),pshape=0, mu=function(p,linear) plogis(linear), linear=~Initial) > detach(ReedfrogFuncresp) > detach("package:gnlm") > detach("package:rmutil") Ben Bolker (McMaster University) GLMs in R 7 January 2011 22 / 25
  • 23. Logit-normal-Poisson with lme4 > library(lme4) > ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp) > glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID), family=binomial, data=ReedfrogFuncresp) > summary(glmLNP) Ben Bolker (McMaster University) GLMs in R 7 January 2011 23 / 25
  • 24. Alternate link functions for reed frog data 1.0 0.8 Fraction killed q 0.6 q q q q q q q q 0.4 q q q q q 0.2 q q 0.0 20 40 60 80 100 Initial density Ben Bolker (McMaster University) GLMs in R 7 January 2011 24 / 25
  • 25. Comparing overdispersion estimates LN−binomial q beta−binomial q sandwich q model q−binom Wald q binomial profile q binomial Wald q −0.015 −0.010 −0.005 0.000 initial density effect Ben Bolker (McMaster University) GLMs in R 7 January 2011 25 / 25