-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Example
library(fixest)
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
all_species = unique(base$species)
res_all = list()
for(i in 1:3){
s = all_species[i]
res_all[[i]] = feols(y ~ x1 + x2, base[base$species == s, ])
}
# Have a look at the mean of the dependent variable:
etable(res_all, fitstat = "my")
#> model 1 model 2 model 3
#> Dependent Var.: y y y
#>
#> (Intercept) 2.304*** (0.3853) 2.116*** (0.4943) 0.6248 (0.5249)
#> x1 0.6674*** (0.0904) 0.2476 (0.1868) 0.2600. (0.1533)
#> x2 0.2834 (0.1972) 0.7356*** (0.1248) 0.9348*** (0.0896)
#> _______________ __________________ __________________ __________________
#> S.E. type IID IID IID
#> Dep. Var. mean 6.5880 6.5880 6.5880
What happens
In fixest
many computations are delayed, occurring after the estimation and only at the user's request.
This is true for computing the standard-errors (when clustering w.r.t. a variable not used in the estimation) and for several fit statistics.
For post-computation to work properly, the original data, the one used for the estimation, needs to be accessed.
Of course, one solution would be to store the original data in the estimation object. This would be safe but would come at an exorbitant memory price.
Currently, the data is accessed by using the same data call as in the estimation: it is just an access to a data set currently stored in memory. Following the example, base[base$species == s, ]
is evaluated to get the data.
In general, this is fine. Now comes the loop problem. When information have to be computed after the loop, like in the example the mean of the dependent variable, for all three models the data is accessed with base[base$species == s, ]
, leading to erroneously use the same data set. Hence leading to wrong results.
Solutions
VCOV
This problem affects the VCOV if clustering (or any other VCOV using an extra variable) has to be performed ex-post.
The solution is then to use the argument vcov
at estimation time.
Note that you will not be able to navigate through different VCOVs (other than standard and heteroskedasticity-robust) ex post. If needed, you will have to store the object with the different VCOVs within the loop, while the right data is accessible.
fit statistics
Currently there is no way to store the fit statistics at estimation time.
There is a major overhaul of the fit statistics mechanism under way. Once the new fit-stats are implemented, that will be possible and easy (basically just using fitstat = TRUE
will do).
In the short run, if you want to use data-dependent fit-stats ex post, you need to do it manually. FWIW, here's an example:
# A) create a custom fit stat
my_meandep = function(x){
# x: fixest object
if("mean_dep" %in% names(x)) x$mean_dep else NA
}
# B) register it
fitstat_register("meandep", my_meandep, "Mean of the Dep. Var.")
# C) within the loop, save the requested information
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
all_species = unique(base$species)
res_all = list()
for(i in 1:3){
s = all_species[i]
# C') => we create the slot mean_dep
mod = feols(y ~ x1 + x2, base[base$species == s, ])
mod$mean_dep = fitstat(mod, "my", simplify = TRUE)
res_all[[i]] = mod
}
# D) use it
etable(res_all, fitstat = ~my + meandep)
#> model 1 model 2 model 3
#> Dependent Var.: y y y
#>
#> (Intercept) 2.304*** (0.3853) 2.116*** (0.4943) 0.6248 (0.5249)
#> x1 0.6674*** (0.0904) 0.2476 (0.1868) 0.2600. (0.1533)
#> x2 0.2834 (0.1972) 0.7356*** (0.1248) 0.9348*** (0.0896)
#> _____________________ __________________ __________________ __________________
#> S.E. type IID IID IID
#> Dep. Var. mean 6.5880 6.5880 6.5880
#> Mean of the Dep. Var. 5.0060 5.9360 6.5880