Skip to content

add_formula not applying offsets and dropping them silently. #162

@lrossouw

Description

@lrossouw

The problem

Using offset terms in a poisson regression results in inconsitent behaviour when it's added via add_formula vs. when it's added via fit. I would expect consistent behaviour or an error message saying that offset functions are not supported in add_formula

Reproducible example

library(tidyverse)
library(tidymodels)
library(poissonreg)

# data
data.frame(
  events = c(1, 2, 4, 1, 2),
  var = factor(c("A", "B", "B", "A", "A")),
  offset_var = c(10, 11, 50, 20, 10)
) -> data

# standard glm
glm_with_offset <- glm(events ~ var + offset(log(offset_var)),
                       family = poisson(link = "log"),
                       data = data)

glm_without_offset <- glm(events ~ var,
                          family = poisson(link = "log"),
                          data = data)


poisson_reg() %>%
  set_engine("glm") ->
  poisson_spec

# below offset in formula gets ignored without error
workflow() %>%
  add_model(poisson_spec) %>%
  add_formula(events ~ var + offset(log(offset_var))) ->
  poisson_exposure_wf

poisson_exposure_wf %>%
  fit(data = data) ->
  tidymodel_without_offset
# the above model is equivalent to glm_without_offset

# tidy workflow (working)
poisson_spec %>%
  fit(events ~ var + offset(log(offset_var)), data = data) ->
  tidymodel_with_offset

# these models are the same
tidymodel_with_offset
#> parsnip model object
#> 
#> 
#> Call:  stats::glm(formula = events ~ var + offset(log(offset_var)), 
#>     family = stats::poisson, data = data)
#> 
#> Coefficients:
#> (Intercept)         varB  
#>    -2.30259     -0.01653  
#> 
#> Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
#> Null Deviance:       2.192 
#> Residual Deviance: 2.191     AIC: 18.68
glm_with_offset
#> 
#> Call:  glm(formula = events ~ var + offset(log(offset_var)), family = poisson(link = "log"), 
#>     data = data)
#> 
#> Coefficients:
#> (Intercept)         varB  
#>    -2.30259     -0.01653  
#> 
#> Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
#> Null Deviance:       2.192 
#> Residual Deviance: 2.191     AIC: 18.68

# these two are the same
tidymodel_without_offset
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: poisson_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> events ~ var + offset(log(offset_var))
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:  stats::glm(formula = ..y ~ ., family = stats::poisson, data = data)
#> 
#> Coefficients:
#> (Intercept)         varB  
#>      0.2877       0.8109  
#> 
#> Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
#> Null Deviance:       2.773 
#> Residual Deviance: 1.151     AIC: 17.64
glm_without_offset
#> 
#> Call:  glm(formula = events ~ var, family = poisson(link = "log"), data = data)
#> 
#> Coefficients:
#> (Intercept)         varB  
#>      0.2877       0.8109  
#> 
#> Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
#> Null Deviance:       2.773 
#> Residual Deviance: 1.151     AIC: 17.64

Created on 2022-08-04 by the reprex package (v2.0.1)

Session info
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] poissonreg_1.0.0   yardstick_1.0.0    workflowsets_1.0.0 workflows_1.0.0   
#>  [5] tune_1.0.0         rsample_1.0.0      recipes_1.0.1      parsnip_1.0.0     
#>  [9] modeldata_1.0.0    infer_1.0.2        dials_1.0.0        scales_1.2.0      
#> [13] broom_1.0.0        tidymodels_1.0.0   forcats_0.5.1      stringr_1.4.0     
#> [17] dplyr_1.0.9        purrr_0.3.4        readr_2.1.2        tidyr_1.2.0       
#> [21] tibble_3.1.8       ggplot2_3.3.6      tidyverse_1.3.2   
#> 
#> loaded via a namespace (and not attached):
#>  [1] fs_1.5.2            lubridate_1.8.0     DiceDesign_1.9     
#>  [4] httr_1.4.3          tools_4.1.1         backports_1.4.1    
#>  [7] utf8_1.2.2          R6_2.5.1            rpart_4.1-15       
#> [10] DBI_1.1.3           colorspace_2.0-3    nnet_7.3-16        
#> [13] withr_2.5.0         tidyselect_1.1.2    compiler_4.1.1     
#> [16] cli_3.3.0           rvest_1.0.2         xml2_1.3.3         
#> [19] digest_0.6.29       rmarkdown_2.14      pkgconfig_2.0.3    
#> [22] htmltools_0.5.3     parallelly_1.32.1   lhs_1.1.5          
#> [25] dbplyr_2.2.1        fastmap_1.1.0       highr_0.9          
#> [28] rlang_1.0.4         readxl_1.4.0        rstudioapi_0.13    
#> [31] generics_0.1.3      jsonlite_1.8.0      googlesheets4_1.0.0
#> [34] magrittr_2.0.3      Matrix_1.3-4        GPfit_1.0-8        
#> [37] Rcpp_1.0.9          munsell_0.5.0       fansi_1.0.3        
#> [40] furrr_0.3.0         lifecycle_1.0.1     stringi_1.7.8      
#> [43] yaml_2.3.5          MASS_7.3-54         grid_4.1.1         
#> [46] parallel_4.1.1      listenv_0.8.0       crayon_1.5.1       
#> [49] lattice_0.20-44     haven_2.5.0         splines_4.1.1      
#> [52] hms_1.1.1           knitr_1.39          pillar_1.8.0       
#> [55] future.apply_1.9.0  codetools_0.2-18    reprex_2.0.1       
#> [58] glue_1.6.2          evaluate_0.15       modelr_0.1.8       
#> [61] foreach_1.5.2       vctrs_0.4.1         tzdb_0.3.0         
#> [64] cellranger_1.1.0    gtable_0.3.0        future_1.27.0      
#> [67] assertthat_0.2.1    xfun_0.31           gower_1.0.0        
#> [70] prodlim_2019.11.13  class_7.3-19        survival_3.2-11    
#> [73] googledrive_2.0.0   gargle_1.2.0        timeDate_4021.104  
#> [76] iterators_1.0.14    hardhat_1.2.0       lava_1.6.10        
#> [79] globals_0.15.1      ellipsis_0.3.2      ipred_0.9-13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions