Original data set cannot be safely accessed when the estimation is performed in a loop

### Example

```R
library(fixest)

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
all_species = unique(base$species)
res_all = list()
for(i in 1:3){
    s = all_species[i]
    res_all[[i]] = feols(y ~ x1 + x2, base[base$species == s, ])
}

# Have a look at the mean of the dependent variable:
etable(res_all, fitstat = "my")
#>                            model 1            model 2            model 3
#> Dependent Var.:                  y                  y                  y
#>                                                                         
#> (Intercept)      2.304*** (0.3853)  2.116*** (0.4943)    0.6248 (0.5249)
#> x1              0.6674*** (0.0904)    0.2476 (0.1868)   0.2600. (0.1533)
#> x2                 0.2834 (0.1972) 0.7356*** (0.1248) 0.9348*** (0.0896)
#> _______________ __________________ __________________ __________________
#> S.E. type                      IID                IID                IID
#> Dep. Var. mean              6.5880             6.5880             6.5880
```

### What happens

In `fixest` many computations are delayed, occurring after the estimation and *only* at the user's request.
This is true for computing the standard-errors (when clustering w.r.t. a variable not used in the estimation) and for several fit statistics.

For post-computation to work properly, the original data, the one used for the estimation, needs to be accessed. 
Of course, one solution would be to store the original data in the estimation object. This would be safe but would come at an exorbitant memory price.
Currently, the data is accessed by using the same data call as in the estimation: it is just an access to a data set currently stored in memory. Following the example, `base[base$species == s, ]` is evaluated to get the data. 

In general, this is fine. Now comes the loop problem. When information have to be computed after the loop, like in the example the mean of the dependent variable, for all three models the data is accessed with `base[base$species == s, ]`, leading to erroneously use the same data set. Hence leading to wrong results.

### Solutions

#### VCOV

This problem affects the VCOV if clustering (or any other VCOV using an extra variable) has to be performed ex-post. 
The solution is then to use the argument `vcov` at estimation time. 

Note that you will not be able to navigate through different VCOVs (other than standard and heteroskedasticity-robust) ex post. If needed, you will have to store the object with the different VCOVs within the loop, while the right data is accessible.


#### fit statistics

Currently there is no way to store the fit statistics at estimation time.
There is a major overhaul of the fit statistics mechanism under way. Once the new fit-stats are implemented, that will be possible and easy (basically just using `fitstat = TRUE` will do).

In the short run, if you want to use data-dependent fit-stats ex post, you need to do it manually. FWIW, here's an example:

```R
# A) create a custom fit stat
my_meandep = function(x){
    # x: fixest object
    if("mean_dep" %in% names(x)) x$mean_dep else NA
}

# B) register it
fitstat_register("meandep", my_meandep, "Mean of the Dep. Var.")

# C) within the loop, save the requested information 

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
all_species = unique(base$species)
res_all = list()
for(i in 1:3){
    s = all_species[i]
    
    # C') => we create the slot mean_dep
    mod = feols(y ~ x1 + x2, base[base$species == s, ])
    mod$mean_dep = fitstat(mod, "my", simplify = TRUE)
    res_all[[i]] = mod
}

# D) use it
etable(res_all, fitstat = ~my + meandep)
#>                                  model 1            model 2            model 3
#> Dependent Var.:                        y                  y                  y
#>                                                                               
#> (Intercept)            2.304*** (0.3853)  2.116*** (0.4943)    0.6248 (0.5249)
#> x1                    0.6674*** (0.0904)    0.2476 (0.1868)   0.2600. (0.1533)
#> x2                       0.2834 (0.1972) 0.7356*** (0.1248) 0.9348*** (0.0896)
#> _____________________ __________________ __________________ __________________
#> S.E. type                            IID                IID                IID
#> Dep. Var. mean                    6.5880             6.5880             6.5880
#> Mean of the Dep. Var.             5.0060             5.9360             6.5880

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Original data set cannot be safely accessed when the estimation is performed in a loop #340

Example

What happens

Solutions

VCOV

fit statistics

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Original data set cannot be safely accessed when the estimation is performed in a loop #340

Description

Example

What happens

Solutions

VCOV

fit statistics

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions