Mon c-5-hartig-2493

DHARMa - Residual diagnostics for
hierarchical statistical models
Talk at ISEC 2018 @florianhartig, Uni Regensburg
Dharma wheel, Sun Temple, Konark, photo credit: Lisa Davis, via Wikimedia commons

Motivation – standard residuals for 2 Poisson regressions
Model 1
Model 2

Issues in interpreting residuals for GLMMs + beyond
§ GLMM distributions are typically asymmetric and change
their shape with the mean – won’t be transformed away by
simply dividing through expected sd (Pearson)
§ Problems get worse for more complicated GLMMs and
hierarchical models, where the effective distribution of the
residuals arises from a mix of distributions / random
effects
§ Consequence: GLMMs(+) are in practice rarely checked,
although they can have all the same problems we teach for
LMs (e.g. misfit, heteroskedasticity, outliers, …)

Solution: simulation-based residual diagnostics
For any statistical model, we can simulate new data based on the
fitted model. Based on this, we can
§ Compare simulated to observed data, either
– Via summary statistics
– Per data point
§ Or refit (aka parametric bootstrap), and compare refitted to
observed residuals
Not a new idea, but the challenge is to make this user-friendly, and
to understand how to best calculate residuals / tests
– Various methods for simulated residual checks implemented in the
DHARMa package (Hartig, 2017, on CRAN)
– DHARMa = Diagnostics for HierArchical Regression Models, but also
broadly “natural order / law” in Eastern philosophies

Teaser: example workflow in DHARMa

How does this work?
Assume new data simulated …
Dharma wheel, Sun Temple, Konark, by Lisa Davis,via Wikimedia commo

Option 1: “global” p-values
§ Calculate p-values for “global” (= all data) summary
statistics:
– Zero-inflation test - calculate simulated number of zeros vs.
observed number of zeros
– Dispersion test - calculate simulated vs. observed variance
around model predictions
Residual
variance
simulated
Residual
variance
observed

What we gain: a generalized dispersion test
§ Omnibus dispersion tests for any statistical model
(including observation-level REs, and terms for var / cor
structures, zi terms)
– Simulations show good power, also compared to parametric tests
(disclaimer: depends a bit on the model structure, for some model
structures refit = T required to get proper power)

Option 2: calculate p-values per observation
§ Goal: measure how far each data point deviates from the expected
distribution
§ Idea: express this in terms of the cumulative distribution of the
simulated data / residuals à standardizes residual to [0,1]
Translation: residual = p(x >= X0),
X0 = null distribution from the
fitted model

Key property for these residuals
§ Each residual [0,1] is essentially a p-value: p(x >= X0)
§ Thus: if the fitted model is correct (H0), the residual
distribution p(x >= X0) should be uniform
– Side note: for discrete distribution, it is essential to add some
additional noise on x and X to make the distribution flat
(Dunn & Smyth, 1996)
§ Consequence: for ANY hierarchical model structure, if
the model is correct in structure and parameters,
residuals should be uniform!

Now we can understand the teaser

DHARMa implements this idea for a wide range of models

A range of further options
§ DHARMa can read in Bayesian posterior predictive
simulations
§ Calculate residuals / dispersion / other arbitrary
summary statistics also per grouping variable
§ Plot / test spatial / temporal autocorrelation
Experience with students and research: extremely helpful,
because it allows to query / examine the model in much
more detail and understand possible problems

Statistical details and challenges I
§ Simulate conditional / unconditional on fitted REs?
– DHARMa allows changingthe conditional structure for the
RE simulations, but default is to re-simulate the entire model
structure (including all REs)
§ Simulate from point estimate, or include uncertainty
of the parameter estimates, as in Bayesian p-values?
– DHARMa currently based on point estimates, and I’m
leaning towards keeping this for frequentist residuals. With
informative priors MLE (not MAP) could even be preferred
for Bayesian model checking (to avoid prior influences), but I
acknowledge that this is philosophically controversial.

Statistical details and challenges II
§ What is the expected distributions of the calculated
summaries / residuals?
– The devil is in the detail. A common question in forums: plot
residuals (DHARMa or others) against mixed model
predictions including REs à you will see a pattern like this
– This pattern is perfectly normal for a structurally correct
model and (I think) originates from the shrinkage on the REs
– remember: plot residuals against fixed effect predictions
only!
§ Many further examples like this

Statistical details and challenges III
§ How do we display the residuals?
§ I prefer the [0,1] (cdf) scaling because it is neutral, and I
believe that uniformity is easier to check visually than
normality
§ However, the [0,1] scaling hides outliers, i.e. is not
necessarily proportional to leverage on the fit – would
desirable additionally highlight outliers / leverage.

Summary
§ Simulation-based / quantile-based residuals create a
very general and flexible framework for checking any
hierarchical statistical model / GLMM
§ Many open question that would warrant further
attention:
– Tests, expected distributions / patterns under H0, how to
best display the residuals …
§ Worth doing this work, because in the end our results
are only as good as our ability to choose the right
model - without proper checks, we are operating blind!

Thank you!
And if you want to check the Dharma of your
models, run install.packages(“DHARMa”)
Dharma wheel, Sun Temple, Konark, by Lisa Davis,via Wikimedia commo

Mon c-5-hartig-2493

More Related Content

Recently uploaded (20)

Featured (20)

Mon c-5-hartig-2493