-
Notifications
You must be signed in to change notification settings - Fork 24
Closed
Description
The problem
Using offset terms in a poisson regression results in inconsitent behaviour when it's added via add_formula
vs. when it's added via fit
. I would expect consistent behaviour or an error message saying that offset functions are not supported in add_formula
Reproducible example
library(tidyverse)
library(tidymodels)
library(poissonreg)
# data
data.frame(
events = c(1, 2, 4, 1, 2),
var = factor(c("A", "B", "B", "A", "A")),
offset_var = c(10, 11, 50, 20, 10)
) -> data
# standard glm
glm_with_offset <- glm(events ~ var + offset(log(offset_var)),
family = poisson(link = "log"),
data = data)
glm_without_offset <- glm(events ~ var,
family = poisson(link = "log"),
data = data)
poisson_reg() %>%
set_engine("glm") ->
poisson_spec
# below offset in formula gets ignored without error
workflow() %>%
add_model(poisson_spec) %>%
add_formula(events ~ var + offset(log(offset_var))) ->
poisson_exposure_wf
poisson_exposure_wf %>%
fit(data = data) ->
tidymodel_without_offset
# the above model is equivalent to glm_without_offset
# tidy workflow (working)
poisson_spec %>%
fit(events ~ var + offset(log(offset_var)), data = data) ->
tidymodel_with_offset
# these models are the same
tidymodel_with_offset
#> parsnip model object
#>
#>
#> Call: stats::glm(formula = events ~ var + offset(log(offset_var)),
#> family = stats::poisson, data = data)
#>
#> Coefficients:
#> (Intercept) varB
#> -2.30259 -0.01653
#>
#> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
#> Null Deviance: 2.192
#> Residual Deviance: 2.191 AIC: 18.68
glm_with_offset
#>
#> Call: glm(formula = events ~ var + offset(log(offset_var)), family = poisson(link = "log"),
#> data = data)
#>
#> Coefficients:
#> (Intercept) varB
#> -2.30259 -0.01653
#>
#> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
#> Null Deviance: 2.192
#> Residual Deviance: 2.191 AIC: 18.68
# these two are the same
tidymodel_without_offset
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: poisson_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> events ~ var + offset(log(offset_var))
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call: stats::glm(formula = ..y ~ ., family = stats::poisson, data = data)
#>
#> Coefficients:
#> (Intercept) varB
#> 0.2877 0.8109
#>
#> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
#> Null Deviance: 2.773
#> Residual Deviance: 1.151 AIC: 17.64
glm_without_offset
#>
#> Call: glm(formula = events ~ var, family = poisson(link = "log"), data = data)
#>
#> Coefficients:
#> (Intercept) varB
#> 0.2877 0.8109
#>
#> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
#> Null Deviance: 2.773
#> Residual Deviance: 1.151 AIC: 17.64
Created on 2022-08-04 by the reprex package (v2.0.1)
Session info
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.6 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] poissonreg_1.0.0 yardstick_1.0.0 workflowsets_1.0.0 workflows_1.0.0
#> [5] tune_1.0.0 rsample_1.0.0 recipes_1.0.1 parsnip_1.0.0
#> [9] modeldata_1.0.0 infer_1.0.2 dials_1.0.0 scales_1.2.0
#> [13] broom_1.0.0 tidymodels_1.0.0 forcats_0.5.1 stringr_1.4.0
#> [17] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
#> [21] tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.5.2 lubridate_1.8.0 DiceDesign_1.9
#> [4] httr_1.4.3 tools_4.1.1 backports_1.4.1
#> [7] utf8_1.2.2 R6_2.5.1 rpart_4.1-15
#> [10] DBI_1.1.3 colorspace_2.0-3 nnet_7.3-16
#> [13] withr_2.5.0 tidyselect_1.1.2 compiler_4.1.1
#> [16] cli_3.3.0 rvest_1.0.2 xml2_1.3.3
#> [19] digest_0.6.29 rmarkdown_2.14 pkgconfig_2.0.3
#> [22] htmltools_0.5.3 parallelly_1.32.1 lhs_1.1.5
#> [25] dbplyr_2.2.1 fastmap_1.1.0 highr_0.9
#> [28] rlang_1.0.4 readxl_1.4.0 rstudioapi_0.13
#> [31] generics_0.1.3 jsonlite_1.8.0 googlesheets4_1.0.0
#> [34] magrittr_2.0.3 Matrix_1.3-4 GPfit_1.0-8
#> [37] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
#> [40] furrr_0.3.0 lifecycle_1.0.1 stringi_1.7.8
#> [43] yaml_2.3.5 MASS_7.3-54 grid_4.1.1
#> [46] parallel_4.1.1 listenv_0.8.0 crayon_1.5.1
#> [49] lattice_0.20-44 haven_2.5.0 splines_4.1.1
#> [52] hms_1.1.1 knitr_1.39 pillar_1.8.0
#> [55] future.apply_1.9.0 codetools_0.2-18 reprex_2.0.1
#> [58] glue_1.6.2 evaluate_0.15 modelr_0.1.8
#> [61] foreach_1.5.2 vctrs_0.4.1 tzdb_0.3.0
#> [64] cellranger_1.1.0 gtable_0.3.0 future_1.27.0
#> [67] assertthat_0.2.1 xfun_0.31 gower_1.0.0
#> [70] prodlim_2019.11.13 class_7.3-19 survival_3.2-11
#> [73] googledrive_2.0.0 gargle_1.2.0 timeDate_4021.104
#> [76] iterators_1.0.14 hardhat_1.2.0 lava_1.6.10
#> [79] globals_0.15.1 ellipsis_0.3.2 ipred_0.9-13
Metadata
Metadata
Assignees
Labels
No labels