SlideShare a Scribd company logo
Advanced Probabilistic Modelling
Bayesian regression models
Irantzu Barrio Beraza
Contents ∣ 2
1 Statistical Bayesian Modelling
2 Bayesian Inference in Linear Models
3 Bayesian Inference in Generalized Linear Models
Statistical Bayesian Modelling
Introduction ∣ 4
So far we have done inference on the parameter of models with a
single variable of interest unrelated to other variables:
• Proportion 𝑝 of sick people in a city? 𝑋 ∼ 𝐵𝑒𝑟(𝑝)
• Average number 𝜆 of patients admitted to a hospital during a
hour? 𝑍 ∼ 𝑃𝑜(𝜆)
• Mean weight 𝜇 of a fish? 𝑌 ∼ 𝑁(𝜇, 𝜎)
We have made inference about one or two of the parameters of
univariate distributions.
But, explaining real life needs more complex models.
Statistics and Modelling ∣ 5
• In general, a model is a small-scale representation of reality:
> either a description of reality,
> a tool to understand the reality or
> a tool for predicting future behavior.
-The best feature of a model is to be as accurate as possible in their task of
representing reality
• Statistics allows us to incorporate the variability present in real life in
our models through randomness.
• Still: “essentially, all models are wrong, but some of them are useful”
(Box, 1987).
• “The problem formulation is more essential than its own solution, which
may simply be a mathematical or experimental skill” (Albert Einstein).
Modelling: types of variables ∣ 6
• Once we have the data collected, we have to model.
• When we model a real problem we ask ourselves
> what do we want to explain?
> and based on what?
• This classifies the variables into:
> Response variables: the ones we want to explain
> Explanatory (or independent, or predictor) variables: those that
serve to explain the response variables.
Statistical models ∣ 7
• How do we build a model that reflects the situation we want to
analyze?
• Most statistical models have a structure of the type:
> Response variable to be explained.
> A systematic component that contains the “general”
information of the system under study, and is expressed as a
combination of explanatory variables in the form of a
parametric equation. It thus indicates how the explanatory
variables affect the response.
> A random component that reflects the intrinsic variability in each
particular situation (in each data).
Statistical models (II) ∣ 8
• Depending on the type of variable, the explanatory variables are:
> Qualitative ⇒ Factors (with their corresponding “levels”)
− Fixed effects (if factor levels are preset in advance: Sex)
− Random effects (if the factor levels are a random sample of the
possible levels of that factor: person)
> Quantitative ⇒ Covariates
Statistical models (III) ∣ 9
• Often the systematic component is expressed as a lineal
combination (but can also be non-linear).
• If the response variable is normal and the relationship is linear,
we have a linear model
• Example: explain a person’s weight by its height and age.
𝜇𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2
• Interpretation is not so clear when the explanatory variables are
discrete or quantitative but we observe them categorized. In
those cases, dummy variables are introduced, allowing us to
describe unique and numerically the status of a discrete variable.
• If the response variable belongs to the exponential family
(binomial, Bernoulli, gamma, Poisson, etc.) we have a
generalized linear model.
Bayesian Inference in Linear Models
Linear Models ∣ 11
• When modelling a real practical situation, we are often faced
with the problem of explaining a continuous (normal) response
variable as a function of one or several covariates (Linear
Models), one or several factors that explain it (ANOVA), or the
situation in which the explanatory variables are both factors and
covariates (ANCOVA).
• We can unify all situations as a Regression Linear Model,
introducing the factors as indicator variables.
• We ask ourselves if each of the covariates (including the
indicators that mark the levels of the qualitative variable) is part
of the model, and then we look for the best model that explains
our response variable among all the possible combinations of
covariates.
Linear Models (II) ∣ 12
The complete model with covariates, factors (in the form of indicator
variables) and interactions has the form:
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖1 + … + 𝛽𝑝𝑋𝑖𝑝
+𝛾1𝐷𝑖1 + … + 𝛾𝑞𝐷𝑖𝑞+
+𝛿11𝑋𝑖1𝐷𝑖1 + … + 𝛿1𝑞𝑋𝑖1𝐷𝑖𝑞+
+ … +
+𝛿𝑝1𝑋𝑖𝑝𝐷𝑖1 + … + 𝛿𝑝𝑞𝑋𝑖𝑝𝐷𝑖𝑞+
+𝜖𝑖; 𝜖𝑖 ∼ 𝑁(0.𝜎) ∀𝑖 = 1. … , 𝑛
𝑌𝑖 ∼ 𝑁(𝜇𝑖 = X𝛽, 𝜎) ∀𝑖 = 1. … , 𝑛
Linear Models (III) ∣ 13
• Objective: to estimate the parameters 𝜃 = (𝛽, 𝜎2
), being
𝛽 = (𝛽0, 𝛽1, … , 𝛽𝑝, 𝛾1, … , 𝛾𝑞, 𝛿11, … , 𝛿𝑝𝑞)
• At the inferential level in both classical and Bayesian there are
analytical solutions for the parameter estimators.
Linear Models (IV) ∣ 14
• To make inference about the parameters of the linear model:
> Data information via the Likelihood: 𝑃(y, x|𝜃).
> The priori distribution of the parameters: 𝑃(𝜃).
> The posterior distribution of the parameters via Bayes’
Theorem:
𝑃(𝜃|y, x) =
𝑃(y, x, 𝜃)
𝑃(y, x)
=
𝑃(𝜃)𝑃(y, x|𝜃)
𝑃(y, x)
=
𝑃(𝜃)𝑃(y, x|𝜃)
∫ 𝑃(𝜃)𝑃(y, x|𝜃)𝑑𝜃
Linear Models (V) ∣ 15
• The information provided by the experiment
y = (𝑦1, … , 𝑦𝑛) with 𝑌𝑖 ∼ 𝑁(𝜇𝑖, 𝜎),
and the relationship between the response variable and the
covariates 𝜇𝑖 = X𝛽 can be expressed through the likelihood:
𝑙(𝛽, 𝜎2
) = 𝑃(y, x|𝜃) = (2𝜋𝜎2
)−𝑛/2
𝑒𝑥𝑝 {−
1
2𝜎2
(y − x𝛽)′
(y − x𝛽)} .
• We can use an uninformative improper prior distribution
(indicating little or no knowledge) about the parameters:
𝑃(𝛽, 𝜎2
) ∝ 1 × 1 × … × 1
⏟
⏟
⏟
⏟
⏟
⏟
⏟
length of the vector of parameters 𝛽
×
1
𝜎2
.
Linear Models (VI) ∣ 16
• Then, the posterior distribution of the parameters is
proportional to
𝑃(𝛽, 𝜎2
|y, x) ∝ 𝑃(𝛽, 𝜎2
)𝑃(y, x|𝛽, 𝜎)
∝ 1
𝜎2 (2𝜋𝜎2
)−𝑛/2
𝑒𝑥𝑝 {− 1
2𝜎2 (y − X𝛽)′
(y − X𝛽)}
Joint posterior distribution for the parameters ∣ 17
As we have seen before, we can obtain the posterior distribution of all
parameters:
𝑃(𝛽, 𝜎2
|y, x) = 𝑃(𝛽|y, x, 𝜎2
)𝑃(𝜎2
|y, x)
where,
𝑃(𝛽|y, x, 𝜎2
) = 𝑁𝑘( ̂
𝛽, (X′
X)−1
𝜎2
)
𝑃(𝜎2
|y, x) = 𝐼𝑛𝑣 − 𝜒2
(𝑛 − 𝑘, ̂
𝜎2
)
being,
̂
𝛽 = (X′
X)−1
Xy
̂
𝜎2
= 1
𝑛−𝑘 (y − X ̂
𝛽)′
(y − X ̂
𝛽)
where 𝑘 = length of the vector of parameters 𝛽.
parámetros escalares
vector de weights
valor escalar
suma de residuales
al cuadrado divididio
en n-k
inversa de Gram Matrix (k, k) que escalada por simulated S2 da la
matriz de varianzas-cov de los coeficientes en la normal k-dimensional
Extinction of Birds (I) ∣ 18
• We consider a study on the extinction of birds (Ramsey and
Schafer, 1997; Pimm et al. 1988)
• Measurements on breeding pairs of land-bird species were
collected from 16 islands around Britain over the course of
several decades.
• For each species, the dataset contains TIME, the average time of
extinction on the islands where it appeared, NESTING, the
average number of nesting pairs, SIZE, the size of the species
(large or small), and STATUS, the migratory status of the
species (migrant or resident).
• The objective is to fit a model that describes the variation in the
time of extinction of the bird species in terms of the covariates
NESTING, SIZE, and STATUS.
• This dataset is available as birdextinct in the LearnBayes
package.
Extinction of Birds (II) ∣ 19
library(LearnBayes)
data(birdextinct)
summary(birdextinct[,2:5])
## time nesting size status
## Min. : 1.000 Min. : 1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 1.907 1st Qu.: 1.448 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 3.180 Median : 2.750 Median :1.0000 Median :1.0000
## Mean : 6.957 Mean : 3.417 Mean :0.5484 Mean :0.6935
## 3rd Qu.: 6.989 3rd Qu.: 4.670 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :58.824 Max. :11.620 Max. :1.0000 Max. :1.0000
Extinction of Birds (III) ∣ 20
plot(density(birdextinct$time))
0 10 20 30 40 50 60
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
density.default(x = birdextinct$time)
Density
Extinction of Birds (IV) ∣ 21
birdextinct$logtime <- log(birdextinct$time)
birdextinct$size <- factor(birdextinct$size, levels = c(0,1),
labels=c("small", "large"))
birdextinct$status <- factor(birdextinct$status, levels = c(0,1),
labels=c("migrant", "resident"))
summary(birdextinct[,4:6])
## size status logtime
## small:28 migrant :19 Min. :0.0000
## large:34 resident:43 1st Qu.:0.6455
## Median :1.1569
## Mean :1.3284
## 3rd Qu.:1.9413
## Max. :4.0746
Extinction of Birds (V) ∣ 22
attach(birdextinct)
plot(nesting,logtime)
2 4 6 8 10 12
0
1
2
3
4
nesting
logtime
Extinction of Birds (VI) ∣ 23
tapply(logtime, status, mean)
## migrant resident
## 0.8000648 1.5617959
tapply(logtime, size, mean)
## small large
## 1.714829 1.010096
Least-squares fit for SIZE (frequentist approach) ∣ 24
fit <- lm(logtime ~ size, x=TRUE, y=TRUE)
summary(fit)
##
## Call:
## lm(formula = logtime ~ size, x = TRUE, y = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7148 -0.6086 -0.1903 0.3796 2.7196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7148 0.1790 9.581 1.05e-13 ***
## sizelarge -0.7047 0.2417 -2.916 0.00498 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9471 on 60 degrees of freedom
## Multiple R-squared: 0.1241, Adjusted R-squared: 0.1095
## F-statistic: 8.502 on 1 and 60 DF, p-value: 0.004983
Linear Models ∣ 25
Joint posterior distribution of all parameters 𝛽 = (𝛽0, 𝛽1):
𝑃(𝛽, 𝜎2
|y, x) = 𝑃(𝛽|y, x, 𝜎2
)𝑃(𝜎2
|y, x)
where,
𝑃(𝛽|y, x, 𝜎2
) = 𝑁𝑘( ̂
𝛽, (X′
X)−1
𝜎2
)
𝑃(𝜎2
|y, x) = 𝐼𝑛𝑣 − 𝜒2
(𝑛 − 𝑘, ̂
𝜎2
)
being,
̂
𝛽 = (X′
X)−1
Xy
̂
𝜎2
= 1
𝑛−𝑘 (y − X ̂
𝛽)′
(y − X ̂
𝛽)
where 𝑘 = 2.
Computing the joint posterior distribution of 𝛽 and 𝜎 in R ∣ 26
S2=sum(fit$residual^2)/fit$df.residual
sqrt(S2) # residual standard error: summary(fit)$sigma
## [1] 0.9470987
# Simulate from the decomposition of the joint
library("extraDistr")
sigma.sim <- rinvchisq(1,nu=fit$df.residual, tau=S2 )
vbeta <- vcov(fit)/S2
beta.sim <- rmnorm(1, mean=fit$coef, varcov=vbeta*sigma.sim)
Joint posterior distribution of 𝛽 and 𝜎 ∣ 27
We can sample from the joint distribution using blinreg function:
theta.sample <- blinreg(fit$y, fit$x, 2000)
par(mfrow=c(1,2))
hist(theta.sample$beta[,2],main="SIZE", xlab=expression(beta[1]))
hist(theta.sample$sigma,main="ERROR SD", xlab=expression(sigma))
SIZE
β1
Frequency
−1.5 −1.0 −0.5 0.0
0
100
200
300
400
500
600
ERROR SD
σ
Frequency
0.8 1.0 1.2 1.4
0
100
200
300
400
Summary of the posterior ∣ 28
apply(theta.sample$beta,2,quantile,c(.025,.5,.975))
## X(Intercept) Xsizelarge
## 2.5% 1.356801 -1.1841607
## 50% 1.716315 -0.7147859
## 97.5% 2.068997 -0.2297954
quantile(theta.sample$sigma,c(.025,.5,.975))
## 2.5% 50% 97.5%
## 0.7950566 0.9509763 1.1560229
Exercise ∣ 29
Study the effect of the covariate STATUS
Species multiple bayesian linear regression ∣ 30
fit.2 <- lm(logtime ~ size + status + nesting, x=TRUE, y=TRUE)
summary(fit.2)
##
## Call:
## lm(formula = logtime ~ size + status + nesting, x = TRUE, y = T
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8410 -0.2932 -0.0709 0.2165 2.5167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.43087 0.20706 2.081 0.041870 *
## sizelarge -0.65220 0.16667 -3.913 0.000242 ***
## statusresident 0.50417 0.18263 2.761 0.007712 **
## nesting 0.26501 0.03679 7.203 1.33e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Species multiple bayesian linear regression ∣ 31
theta.multi.sample <- blinreg(fit.2$y, fit.2$x, 5000)
par(mfrow=c(2,2))
hist(theta.multi.sample$beta[,2],
main="SIZE - LARGE", xlab=expression(beta[1]))
hist(theta.multi.sample$beta[,3],
main="STATUS - RESIDENT", xlab=expression(beta[2]))
hist(theta.multi.sample$beta[,4],
main="NESTING", xlab=expression(beta[3]))
hist(theta.multi.sample$sigma,
main="ERROR SD", xlab=expression(sigma))
Species multiple bayesian linear regression ∣ 32
SIZE − LARGE
β1
Frequency
−1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0
0
200
600
1000
STATUS − RESIDENT
β2
Frequency
0.0 0.5 1.0
0
200
600
1000
NESTING
β3
Frequency
0.10 0.15 0.20 0.25 0.30 0.35 0.40
0
200
600
1000
ERROR SD
σ
Frequency
0.5 0.6 0.7 0.8 0.9 1.0
0
500
1000
1500
Summary of the posterior ∣ 33
apply(theta.multi.sample$beta,2,quantile,c(.025,.975, 0.5))
## X(Intercept) Xsizelarge Xstatusresident Xnesting
## 2.5% 0.0110419 -0.9934889 0.1494833 0.1889342
## 97.5% 0.8449834 -0.3160767 0.8649601 0.3385768
## 50% 0.4315846 -0.6503806 0.5072949 0.2654665
Other R functions ∣ 34
• To fit a Bayesian linear regression we can also use the function
stan_glm from the rstanarm package.
• See for further reference (Muth et al, 2018).
Other R functions ∣ 35
• Some of the arguments for this function are the following:
> family: by default this function uses the Gaussian distribution as
we do with the classical glm function to perform lm model.
> prior: the prior distribution for the regression coefficients, by
default the normal prior is used. There are subset of functions
used for the prior provided by rstanarm. If we want a flat
uniform prior we set this to NULL.
> prior_intercept: prior for the intercept, can be normal, student
t , or Cauchy. If we want a flat uniform prior we set this to NULL.
> prior_aux: prior fo auxiliary parameters such as the error
standard deviation for the Gaussian family.
> algorithm: The estimating approach to use. The default is
sampling MCMC.
> iter: is the number of iterations if the MCMC method is used,
the default is 4000.
Extinction of birds with stan_glm ∣ 36
library(rstanarm)
bayes.lm <- stan_glm(logtime ~ size + status + nesting,
prior=NULL, prior_intercep=NULL,
prior_aux=NULL, seed=111, data=birdextinct)
Extinction of birds with stan_glm ∣ 37
summary(bayes.lm)
##
## Model Info:
## function: stan_glm
## family: gaussian [identity]
## formula: logtime ~ size + status + nesting
## algorithm: sampling
## sample: 4000 (posterior sample size)
## priors: see help('prior_summary')
## observations: 62
## predictors: 4
##
## Estimates:
## mean sd 10% 50% 90%
## (Intercept) 0.4 0.2 0.2 0.4 0.7
## sizelarge -0.6 0.2 -0.9 -0.7 -0.4
## statusresident 0.5 0.2 0.3 0.5 0.7
## nesting 0.3 0.0 0.2 0.3 0.3
## sigma 0.7 0.1 0.6 0.7 0.8
##
Extinction of birds with stan_glm ∣ 38
library(bayesplot)
mcmc_dens(bayes.lm,
pars = c("sizelarge", "statusresident","nesting"))
sizelarge statusresident nesting
−1.25 −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 0.2 0.3 0.4
Posterior credible intervals ∣ 39
plot(bayes.lm)
sigma
nesting
statusresident
sizelarge
(Intercept)
−1.0 −0.5 0.0 0.5
Exercise ∣ 40
Repeat the analysis considering a normal prior for 𝛽.
Help: https://mc-stan.org/rstanarm/articles/priors.html
Bayesian Inference in Generalized Linear Models
Introduction ∣ 42
• Generalized Linear Models (GLM) extend the linear regression
model in order to accommodate:
> non-normal responses, e.g. binomial data, frequency data, etc.
> and transformation to linearity
• Well known models are logistic regression, log-linear models for
frequency tables, Poisson regression, Gamma regression, etc.
• The conceptual advantage is that many data analytic problems
with non-normal data are reduced to regression modelling.
Introduction (II) ∣ 43
A Generalized linear model consists of:
• A set of 𝑌1, … , 𝑌𝑛 random variables (response) independents
and identically distributed inside the Exponential family
(https://en.wikipedia.org/wiki/Exponential_family).
• A set of explanatory variables 𝑋1, 𝑋2, … , 𝑋𝑝 that along with a
parametric vector (𝛽0, 𝛽1, … , 𝛽𝑝) form the linear predictor:
𝜂𝑖 = 𝛽0 + 𝛽1𝑋1 + … + 𝛽𝑝𝑋𝑝, 𝑖 = 1, … , 𝑛
• A monotonic and differentiable function called link function 𝑔(),
defining the relationship between the mean of the response
𝜇𝑖 = 𝐸(𝑌 𝑖) and the linear predictor:
𝑔(𝜇𝑖) = 𝜂𝑖
• Equivalently: 𝐸[𝑌 𝑖] = 𝑔−1
(𝛽0 + 𝛽1𝑋𝑖1 + … + 𝛽𝑝𝑋𝑖𝑝).
Bayesian analysis in GLM’s ∣ 44
• Start with the corresponding likelihood which contains the
available information about the parameters: the coefficients of
the linear predictor.
• For example, in the case of Poisson data with mean 𝜆𝑖 and
logarithm link 𝑔(𝜆𝑖) = 𝑙𝑜𝑔(𝜆𝑖), the likelihood is:
𝑙(𝛽|y, x) =
𝑛
∏
𝑖=1
(
1
𝑦𝑖!
) 𝑒𝑥𝑝 {𝑦𝑖x𝑡
𝑖𝛽 − 𝑒𝑥𝑝(x𝑡
𝑖𝛽)}
• In frequentist statistics: maximum likelihood estimates are found
using an iteratively reweighted least squares algorithm using
either a Newton-Raphson method or the Fisher’s scoring method.
• But, what about the Bayesian approach? Not much different
conceptually.
Why Bayesian GLM’s ∣ 45
• The Bayesian point of view is not a technique in the field of
Statistics.
• It is another way of understanding and performing Statistics.
• And so, when our data bring us to Generalized Linear Models we
can solve it using both Bayesian and Frequentist methods.
• Bayesian statistical analysis has benefited from the explosion of
cheap and powerful desktop computing over the last two decades:
MCMC.
• Bayesian techniques can now be applied to complex
modeling problems where they could not have been applied
previously.
• Bayesian perspective will probably continue to challenge, and
perhaps supplant, traditional frequentist statistical methods
which have dominated many disciplines of science for a long time.
Bayesian analysis in GLM’s ∣ 46
• Conceptually the Bayesian specification is “straightforward”.
• Starting with the corresponding likelihood, we “only” need to
assign a prior for regression coefficients.
• But, how to do this assignment? Not an easy answer.
• Easy option: to choose conjugate or non-informative independent
priors. For instance normal or flat.
• But, as usual in Bayesian modelling, there are no closed form
solution available for the posterior distribution of parameters.
• Here is where Numerical techniques come to rescue us by
allowing us to obtain approximations of the posterior distribution.
Software for Bayesian GLM’s ∣ 47
• To conduct the Bayesian GLM in R, we can use the package arm
which contains the bayesglm function (Gelman et al., 2010).
• We can also use the package MCMCpack also from R software
which contains several functions to do so, like MCMClogit,
MCMCPoisson or MCMCprobit.
• Nevertheless, there are more ways to perform Bayesian MCMC
analysis in general which can also be used for GLM’s. One of the
most popular is BUGS (Bayesian Analysis using Gibbs Samplig)
by Lunn et al. (2000).
http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml
• BUGS has two ways to run it, WinBUGS being the most popular,
and OpenBUGS being the other option.
• They can be used from R using the package R2WinBUGS or
BRugs (the interface to the OpenBUGS).
Software for Bayesian GLM’s (II) ∣ 48
• MCMC is not the only way to approximate posterior distributions.
• As obtaining posteriors is equivalent to integrate, numerical
integration is another way to do it.
• Then, Laplace approximation can be used to numerically
integrate.
• In fact the Integrated Nested Laplace Approximation (INLA) due
to Rue et al. (2009) has become a fast and powerful tool lately.
• It can be easily connected with R: http://www.r-inla.org/
Infant respiratory disease ∣ 49
We consider a study in which the probability of children developing
bronchitis or pneumonia in their first year of life is studied by type of
feeding and sex (it can be found in the library faraway).
y<-rep(c(1,0,1,0,1,0,1,0,1,0,1,0),c(77,381,19,128,47,447,
48,336,16,111,31,433))
sexn<-factor(rep(c("boy","girl"),c(1099,975)))
foodn<-factor(rep(c("Bottle","Suppl","Breast",
"Bottle","Suppl","Breast"),
c(458,147,494,384,127,464)))
db<-data.frame(y,sexn,foodn); summary(db)
## y sexn foodn
## Min. :0.0000 boy :1099 Bottle:842
## 1st Qu.:0.0000 girl: 975 Breast:958
## Median :0.0000 Suppl :274
## Mean :0.1148
## 3rd Qu.:0.0000
## Max. :1.0000
Infant respiratory disease with bayesglm ∣ 50
The bayesglm function represents a kind of short cut of the Bayesian
approach to inference. Typically, the posterior is not used directly for
making inferences. Instead, an empirical distribution is constructed based on
draws from the posterior and that empirical distribution is what informs the
inference(s).
library(arm)
bm.1 <- bayesglm (y ~ sexn + foodn,
family = binomial(link="logit"),
prior.scale=Inf, prior.df=Inf)
# just a test: this should be identical to classical logit
# prior mean by default is 0
bm.2 <- bayesglm (y ~ sexn + foodn,
family = binomial(link="logit"))
# default Cauchy prior with scale 2.5
bm.3 <- bayesglm (y ~ sexn + foodn,
family = binomial(link="logit"),
prior.scale=2.5, prior.df=Inf)
Infant respiratory disease with bayesglm ∣ 51
We can retrieve the posterior distribution of all 𝛽 parameters
plot(density(coef(sim(bm.3))[,2]), main="",
xlab="posterior beta for sex")
−0.6 −0.4 −0.2 0.0 0.2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
posterior beta for sex
Density
Infant respiratory disease with bayesglm ∣ 52
We can also retrieve the 95% credible interval for the coefficients
apply(coef(sim(bm.3)),2, quantile,c(.025,.5,.975))
## (Intercept) sexngirl foodnBreast foodnSuppl
## 2.5% -1.844987 -0.58824046 -0.9551455 -0.5728774
## 50% -1.599999 -0.32135165 -0.6714483 -0.1620121
## 97.5% -1.391229 -0.05089078 -0.4118225 0.1830197
Recall, in Bayesian Statistics this credible interval is interpreted as: there is
a 95% probability that the true population value of the coefficient for girl
is between -0.55 and -0.05.
Infant respiratory disease with inla ∣ 53
Install inla from https://www.r-inla.org/download-install
install.packages("INLA",repos=c(getOption("repos"),
INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
# upgrade the package
inla.upgrade()
library(INLA)
## Warning: package 'INLA' was built under R version 4.2.1
# INLA Method
bm.inla1 <- inla(y ~ sexn + foodn, data=db,
family = "binomial", control.compute = list(dic =
Infant respiratory disease with inla ∣ 54
# INLA Method
# Output Posterior Estimates
round(bm.inla1$summary.fixed, 4)
## mean sd 0.025quant 0.5quant 0.975quant mode
## (Intercept) -1.6135 0.1124 -1.8379 -1.6121 -1.3967 NA
## sexngirl -0.3130 0.1410 -0.5914 -0.3123 -0.0378 NA
## foodnBreast -0.6700 0.1530 -0.9725 -0.6690 -0.3720 NA
## foodnSuppl -0.1728 0.2056 -0.5860 -0.1695 0.2208 NA
Model comparison in Bayesian GLM’s ∣ 55
• Once the inference has been done, we have to find the best
model by selecting among the possible variables to be included in
the model.
• One way (not the only one) to compare models in Bayesian
statistics is using the DIC criterion. DIC is the counterpart of
AIC or BIC to compare models. We can fit each model
separately to calculate DIC or alternatively, all models can be
simultaneously fitted.
• As usual, we select the best model as the one with the lowest
DIC value.
bm.inla1$dic$dic
## [1] 1460.477
Acknowledgement ∣ 56
Note: this document is based on material kindly provided by Professor David
Conesa of the University of Valencia and the Valencia Bayesian Research
Group (http://vabar.es/)
Bibliography ∣ 57
• Albert, J. (2009). Bayesian Computation with R. Springer.

More Related Content

PDF
Bayesian regression intro with r
PDF
Petrini - MSc Thesis
PPTX
1. linear model, inference, prediction
PPTX
ERF Training Workshop Panel Data 5
PPT
Statistics and data analytics explained in detail
PDF
I stata
PDF
Lecture 1.pdf
PDF
Linear Regression.pdf
Bayesian regression intro with r
Petrini - MSc Thesis
1. linear model, inference, prediction
ERF Training Workshop Panel Data 5
Statistics and data analytics explained in detail
I stata
Lecture 1.pdf
Linear Regression.pdf

Similar to Subject-3---Bayesian-regression-models-2024.pdf (20)

PDF
The linear regression model: Theory and Application
PDF
Linear regression
PDF
Mathematical Statistics basic ideas and selected topics Volume I Second Editi...
PDF
bayes_proj
PDF
Linear_Models_with_R_----_(2._Estimation).pdf
PDF
Regression on gaussian symbols
PPT
Chapter14
ODP
Iwsmbvs
PDF
Test Bank for Stats Data and Models 5th by De Veaux
PPTX
An Introduction to Simulation in the Social Sciences
PDF
Lecturenotesstatistics
PDF
Modelling Binary Data Second Edition Collett
PDF
ProjectWriteupforClass (3)
DOCX
stochastic notes
PDF
better together? statistical learning in models made of modules
PPTX
Static Models of Continuous Variables
ODP
Linear Regression
PDF
Applied Multivariate Statistical Analysis 5th Edition 5th Edition Richard Arn...
PPTX
Advanced Methods of Statistical Analysis used in Animal Breeding.
The linear regression model: Theory and Application
Linear regression
Mathematical Statistics basic ideas and selected topics Volume I Second Editi...
bayes_proj
Linear_Models_with_R_----_(2._Estimation).pdf
Regression on gaussian symbols
Chapter14
Iwsmbvs
Test Bank for Stats Data and Models 5th by De Veaux
An Introduction to Simulation in the Social Sciences
Lecturenotesstatistics
Modelling Binary Data Second Edition Collett
ProjectWriteupforClass (3)
stochastic notes
better together? statistical learning in models made of modules
Static Models of Continuous Variables
Linear Regression
Applied Multivariate Statistical Analysis 5th Edition 5th Edition Richard Arn...
Advanced Methods of Statistical Analysis used in Animal Breeding.
Ad

Recently uploaded (20)

PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PPT
Adrenergic drugs (sympathomimetics ).ppt
PPTX
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
PPT
KULIAH UG WANITA Prof Endang 121110 (1).ppt
PPT
Recent advances in Diagnosis of Autoimmune Disorders
DOCX
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
PDF
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
PPTX
Genaralised anxiety disorder presentation
PPTX
AI_in_Pharmaceutical_Technology_Presentation.pptx
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
PDF
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
PDF
Dermatology diseases Index August 2025.pdf
PPTX
Bronchial_Asthma_in_acute_exacerbation_.pptx
PPTX
Vaginal Bleeding and Uterine Fibroids p
PDF
Dr. Jasvant Modi - Passionate About Philanthropy
PPTX
1. Drug Distribution System.pptt b pharmacy
PPTX
Infection prevention and control for medical students
PPTX
Current Treatment Of Heart Failure By Dr Masood Ahmed
PDF
DAY-6. Summer class. Ppt. Cultural Nursing
PPTX
ABG advance Arterial Blood Gases Analysis
Rheumatic heart diseases with Type 2 Diabetes Mellitus
Adrenergic drugs (sympathomimetics ).ppt
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
KULIAH UG WANITA Prof Endang 121110 (1).ppt
Recent advances in Diagnosis of Autoimmune Disorders
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
Genaralised anxiety disorder presentation
AI_in_Pharmaceutical_Technology_Presentation.pptx
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
Dermatology diseases Index August 2025.pdf
Bronchial_Asthma_in_acute_exacerbation_.pptx
Vaginal Bleeding and Uterine Fibroids p
Dr. Jasvant Modi - Passionate About Philanthropy
1. Drug Distribution System.pptt b pharmacy
Infection prevention and control for medical students
Current Treatment Of Heart Failure By Dr Masood Ahmed
DAY-6. Summer class. Ppt. Cultural Nursing
ABG advance Arterial Blood Gases Analysis
Ad

Subject-3---Bayesian-regression-models-2024.pdf

  • 1. Advanced Probabilistic Modelling Bayesian regression models Irantzu Barrio Beraza
  • 2. Contents ∣ 2 1 Statistical Bayesian Modelling 2 Bayesian Inference in Linear Models 3 Bayesian Inference in Generalized Linear Models
  • 4. Introduction ∣ 4 So far we have done inference on the parameter of models with a single variable of interest unrelated to other variables: • Proportion 𝑝 of sick people in a city? 𝑋 ∼ 𝐵𝑒𝑟(𝑝) • Average number 𝜆 of patients admitted to a hospital during a hour? 𝑍 ∼ 𝑃𝑜(𝜆) • Mean weight 𝜇 of a fish? 𝑌 ∼ 𝑁(𝜇, 𝜎) We have made inference about one or two of the parameters of univariate distributions. But, explaining real life needs more complex models.
  • 5. Statistics and Modelling ∣ 5 • In general, a model is a small-scale representation of reality: > either a description of reality, > a tool to understand the reality or > a tool for predicting future behavior. -The best feature of a model is to be as accurate as possible in their task of representing reality • Statistics allows us to incorporate the variability present in real life in our models through randomness. • Still: “essentially, all models are wrong, but some of them are useful” (Box, 1987). • “The problem formulation is more essential than its own solution, which may simply be a mathematical or experimental skill” (Albert Einstein).
  • 6. Modelling: types of variables ∣ 6 • Once we have the data collected, we have to model. • When we model a real problem we ask ourselves > what do we want to explain? > and based on what? • This classifies the variables into: > Response variables: the ones we want to explain > Explanatory (or independent, or predictor) variables: those that serve to explain the response variables.
  • 7. Statistical models ∣ 7 • How do we build a model that reflects the situation we want to analyze? • Most statistical models have a structure of the type: > Response variable to be explained. > A systematic component that contains the “general” information of the system under study, and is expressed as a combination of explanatory variables in the form of a parametric equation. It thus indicates how the explanatory variables affect the response. > A random component that reflects the intrinsic variability in each particular situation (in each data).
  • 8. Statistical models (II) ∣ 8 • Depending on the type of variable, the explanatory variables are: > Qualitative ⇒ Factors (with their corresponding “levels”) − Fixed effects (if factor levels are preset in advance: Sex) − Random effects (if the factor levels are a random sample of the possible levels of that factor: person) > Quantitative ⇒ Covariates
  • 9. Statistical models (III) ∣ 9 • Often the systematic component is expressed as a lineal combination (but can also be non-linear). • If the response variable is normal and the relationship is linear, we have a linear model • Example: explain a person’s weight by its height and age. 𝜇𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 • Interpretation is not so clear when the explanatory variables are discrete or quantitative but we observe them categorized. In those cases, dummy variables are introduced, allowing us to describe unique and numerically the status of a discrete variable. • If the response variable belongs to the exponential family (binomial, Bernoulli, gamma, Poisson, etc.) we have a generalized linear model.
  • 10. Bayesian Inference in Linear Models
  • 11. Linear Models ∣ 11 • When modelling a real practical situation, we are often faced with the problem of explaining a continuous (normal) response variable as a function of one or several covariates (Linear Models), one or several factors that explain it (ANOVA), or the situation in which the explanatory variables are both factors and covariates (ANCOVA). • We can unify all situations as a Regression Linear Model, introducing the factors as indicator variables. • We ask ourselves if each of the covariates (including the indicators that mark the levels of the qualitative variable) is part of the model, and then we look for the best model that explains our response variable among all the possible combinations of covariates.
  • 12. Linear Models (II) ∣ 12 The complete model with covariates, factors (in the form of indicator variables) and interactions has the form: 𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖1 + … + 𝛽𝑝𝑋𝑖𝑝 +𝛾1𝐷𝑖1 + … + 𝛾𝑞𝐷𝑖𝑞+ +𝛿11𝑋𝑖1𝐷𝑖1 + … + 𝛿1𝑞𝑋𝑖1𝐷𝑖𝑞+ + … + +𝛿𝑝1𝑋𝑖𝑝𝐷𝑖1 + … + 𝛿𝑝𝑞𝑋𝑖𝑝𝐷𝑖𝑞+ +𝜖𝑖; 𝜖𝑖 ∼ 𝑁(0.𝜎) ∀𝑖 = 1. … , 𝑛 𝑌𝑖 ∼ 𝑁(𝜇𝑖 = X𝛽, 𝜎) ∀𝑖 = 1. … , 𝑛
  • 13. Linear Models (III) ∣ 13 • Objective: to estimate the parameters 𝜃 = (𝛽, 𝜎2 ), being 𝛽 = (𝛽0, 𝛽1, … , 𝛽𝑝, 𝛾1, … , 𝛾𝑞, 𝛿11, … , 𝛿𝑝𝑞) • At the inferential level in both classical and Bayesian there are analytical solutions for the parameter estimators.
  • 14. Linear Models (IV) ∣ 14 • To make inference about the parameters of the linear model: > Data information via the Likelihood: 𝑃(y, x|𝜃). > The priori distribution of the parameters: 𝑃(𝜃). > The posterior distribution of the parameters via Bayes’ Theorem: 𝑃(𝜃|y, x) = 𝑃(y, x, 𝜃) 𝑃(y, x) = 𝑃(𝜃)𝑃(y, x|𝜃) 𝑃(y, x) = 𝑃(𝜃)𝑃(y, x|𝜃) ∫ 𝑃(𝜃)𝑃(y, x|𝜃)𝑑𝜃
  • 15. Linear Models (V) ∣ 15 • The information provided by the experiment y = (𝑦1, … , 𝑦𝑛) with 𝑌𝑖 ∼ 𝑁(𝜇𝑖, 𝜎), and the relationship between the response variable and the covariates 𝜇𝑖 = X𝛽 can be expressed through the likelihood: 𝑙(𝛽, 𝜎2 ) = 𝑃(y, x|𝜃) = (2𝜋𝜎2 )−𝑛/2 𝑒𝑥𝑝 {− 1 2𝜎2 (y − x𝛽)′ (y − x𝛽)} . • We can use an uninformative improper prior distribution (indicating little or no knowledge) about the parameters: 𝑃(𝛽, 𝜎2 ) ∝ 1 × 1 × … × 1 ⏟ ⏟ ⏟ ⏟ ⏟ ⏟ ⏟ length of the vector of parameters 𝛽 × 1 𝜎2 .
  • 16. Linear Models (VI) ∣ 16 • Then, the posterior distribution of the parameters is proportional to 𝑃(𝛽, 𝜎2 |y, x) ∝ 𝑃(𝛽, 𝜎2 )𝑃(y, x|𝛽, 𝜎) ∝ 1 𝜎2 (2𝜋𝜎2 )−𝑛/2 𝑒𝑥𝑝 {− 1 2𝜎2 (y − X𝛽)′ (y − X𝛽)}
  • 17. Joint posterior distribution for the parameters ∣ 17 As we have seen before, we can obtain the posterior distribution of all parameters: 𝑃(𝛽, 𝜎2 |y, x) = 𝑃(𝛽|y, x, 𝜎2 )𝑃(𝜎2 |y, x) where, 𝑃(𝛽|y, x, 𝜎2 ) = 𝑁𝑘( ̂ 𝛽, (X′ X)−1 𝜎2 ) 𝑃(𝜎2 |y, x) = 𝐼𝑛𝑣 − 𝜒2 (𝑛 − 𝑘, ̂ 𝜎2 ) being, ̂ 𝛽 = (X′ X)−1 Xy ̂ 𝜎2 = 1 𝑛−𝑘 (y − X ̂ 𝛽)′ (y − X ̂ 𝛽) where 𝑘 = length of the vector of parameters 𝛽. parámetros escalares vector de weights valor escalar suma de residuales al cuadrado divididio en n-k inversa de Gram Matrix (k, k) que escalada por simulated S2 da la matriz de varianzas-cov de los coeficientes en la normal k-dimensional
  • 18. Extinction of Birds (I) ∣ 18 • We consider a study on the extinction of birds (Ramsey and Schafer, 1997; Pimm et al. 1988) • Measurements on breeding pairs of land-bird species were collected from 16 islands around Britain over the course of several decades. • For each species, the dataset contains TIME, the average time of extinction on the islands where it appeared, NESTING, the average number of nesting pairs, SIZE, the size of the species (large or small), and STATUS, the migratory status of the species (migrant or resident). • The objective is to fit a model that describes the variation in the time of extinction of the bird species in terms of the covariates NESTING, SIZE, and STATUS. • This dataset is available as birdextinct in the LearnBayes package.
  • 19. Extinction of Birds (II) ∣ 19 library(LearnBayes) data(birdextinct) summary(birdextinct[,2:5]) ## time nesting size status ## Min. : 1.000 Min. : 1.000 Min. :0.0000 Min. :0.0000 ## 1st Qu.: 1.907 1st Qu.: 1.448 1st Qu.:0.0000 1st Qu.:0.0000 ## Median : 3.180 Median : 2.750 Median :1.0000 Median :1.0000 ## Mean : 6.957 Mean : 3.417 Mean :0.5484 Mean :0.6935 ## 3rd Qu.: 6.989 3rd Qu.: 4.670 3rd Qu.:1.0000 3rd Qu.:1.0000 ## Max. :58.824 Max. :11.620 Max. :1.0000 Max. :1.0000
  • 20. Extinction of Birds (III) ∣ 20 plot(density(birdextinct$time)) 0 10 20 30 40 50 60 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 density.default(x = birdextinct$time) Density
  • 21. Extinction of Birds (IV) ∣ 21 birdextinct$logtime <- log(birdextinct$time) birdextinct$size <- factor(birdextinct$size, levels = c(0,1), labels=c("small", "large")) birdextinct$status <- factor(birdextinct$status, levels = c(0,1), labels=c("migrant", "resident")) summary(birdextinct[,4:6]) ## size status logtime ## small:28 migrant :19 Min. :0.0000 ## large:34 resident:43 1st Qu.:0.6455 ## Median :1.1569 ## Mean :1.3284 ## 3rd Qu.:1.9413 ## Max. :4.0746
  • 22. Extinction of Birds (V) ∣ 22 attach(birdextinct) plot(nesting,logtime) 2 4 6 8 10 12 0 1 2 3 4 nesting logtime
  • 23. Extinction of Birds (VI) ∣ 23 tapply(logtime, status, mean) ## migrant resident ## 0.8000648 1.5617959 tapply(logtime, size, mean) ## small large ## 1.714829 1.010096
  • 24. Least-squares fit for SIZE (frequentist approach) ∣ 24 fit <- lm(logtime ~ size, x=TRUE, y=TRUE) summary(fit) ## ## Call: ## lm(formula = logtime ~ size, x = TRUE, y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.7148 -0.6086 -0.1903 0.3796 2.7196 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.7148 0.1790 9.581 1.05e-13 *** ## sizelarge -0.7047 0.2417 -2.916 0.00498 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9471 on 60 degrees of freedom ## Multiple R-squared: 0.1241, Adjusted R-squared: 0.1095 ## F-statistic: 8.502 on 1 and 60 DF, p-value: 0.004983
  • 25. Linear Models ∣ 25 Joint posterior distribution of all parameters 𝛽 = (𝛽0, 𝛽1): 𝑃(𝛽, 𝜎2 |y, x) = 𝑃(𝛽|y, x, 𝜎2 )𝑃(𝜎2 |y, x) where, 𝑃(𝛽|y, x, 𝜎2 ) = 𝑁𝑘( ̂ 𝛽, (X′ X)−1 𝜎2 ) 𝑃(𝜎2 |y, x) = 𝐼𝑛𝑣 − 𝜒2 (𝑛 − 𝑘, ̂ 𝜎2 ) being, ̂ 𝛽 = (X′ X)−1 Xy ̂ 𝜎2 = 1 𝑛−𝑘 (y − X ̂ 𝛽)′ (y − X ̂ 𝛽) where 𝑘 = 2.
  • 26. Computing the joint posterior distribution of 𝛽 and 𝜎 in R ∣ 26 S2=sum(fit$residual^2)/fit$df.residual sqrt(S2) # residual standard error: summary(fit)$sigma ## [1] 0.9470987 # Simulate from the decomposition of the joint library("extraDistr") sigma.sim <- rinvchisq(1,nu=fit$df.residual, tau=S2 ) vbeta <- vcov(fit)/S2 beta.sim <- rmnorm(1, mean=fit$coef, varcov=vbeta*sigma.sim)
  • 27. Joint posterior distribution of 𝛽 and 𝜎 ∣ 27 We can sample from the joint distribution using blinreg function: theta.sample <- blinreg(fit$y, fit$x, 2000) par(mfrow=c(1,2)) hist(theta.sample$beta[,2],main="SIZE", xlab=expression(beta[1])) hist(theta.sample$sigma,main="ERROR SD", xlab=expression(sigma)) SIZE β1 Frequency −1.5 −1.0 −0.5 0.0 0 100 200 300 400 500 600 ERROR SD σ Frequency 0.8 1.0 1.2 1.4 0 100 200 300 400
  • 28. Summary of the posterior ∣ 28 apply(theta.sample$beta,2,quantile,c(.025,.5,.975)) ## X(Intercept) Xsizelarge ## 2.5% 1.356801 -1.1841607 ## 50% 1.716315 -0.7147859 ## 97.5% 2.068997 -0.2297954 quantile(theta.sample$sigma,c(.025,.5,.975)) ## 2.5% 50% 97.5% ## 0.7950566 0.9509763 1.1560229
  • 29. Exercise ∣ 29 Study the effect of the covariate STATUS
  • 30. Species multiple bayesian linear regression ∣ 30 fit.2 <- lm(logtime ~ size + status + nesting, x=TRUE, y=TRUE) summary(fit.2) ## ## Call: ## lm(formula = logtime ~ size + status + nesting, x = TRUE, y = T ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.8410 -0.2932 -0.0709 0.2165 2.5167 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.43087 0.20706 2.081 0.041870 * ## sizelarge -0.65220 0.16667 -3.913 0.000242 *** ## statusresident 0.50417 0.18263 2.761 0.007712 ** ## nesting 0.26501 0.03679 7.203 1.33e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 31. Species multiple bayesian linear regression ∣ 31 theta.multi.sample <- blinreg(fit.2$y, fit.2$x, 5000) par(mfrow=c(2,2)) hist(theta.multi.sample$beta[,2], main="SIZE - LARGE", xlab=expression(beta[1])) hist(theta.multi.sample$beta[,3], main="STATUS - RESIDENT", xlab=expression(beta[2])) hist(theta.multi.sample$beta[,4], main="NESTING", xlab=expression(beta[3])) hist(theta.multi.sample$sigma, main="ERROR SD", xlab=expression(sigma))
  • 32. Species multiple bayesian linear regression ∣ 32 SIZE − LARGE β1 Frequency −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0 200 600 1000 STATUS − RESIDENT β2 Frequency 0.0 0.5 1.0 0 200 600 1000 NESTING β3 Frequency 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0 200 600 1000 ERROR SD σ Frequency 0.5 0.6 0.7 0.8 0.9 1.0 0 500 1000 1500
  • 33. Summary of the posterior ∣ 33 apply(theta.multi.sample$beta,2,quantile,c(.025,.975, 0.5)) ## X(Intercept) Xsizelarge Xstatusresident Xnesting ## 2.5% 0.0110419 -0.9934889 0.1494833 0.1889342 ## 97.5% 0.8449834 -0.3160767 0.8649601 0.3385768 ## 50% 0.4315846 -0.6503806 0.5072949 0.2654665
  • 34. Other R functions ∣ 34 • To fit a Bayesian linear regression we can also use the function stan_glm from the rstanarm package. • See for further reference (Muth et al, 2018).
  • 35. Other R functions ∣ 35 • Some of the arguments for this function are the following: > family: by default this function uses the Gaussian distribution as we do with the classical glm function to perform lm model. > prior: the prior distribution for the regression coefficients, by default the normal prior is used. There are subset of functions used for the prior provided by rstanarm. If we want a flat uniform prior we set this to NULL. > prior_intercept: prior for the intercept, can be normal, student t , or Cauchy. If we want a flat uniform prior we set this to NULL. > prior_aux: prior fo auxiliary parameters such as the error standard deviation for the Gaussian family. > algorithm: The estimating approach to use. The default is sampling MCMC. > iter: is the number of iterations if the MCMC method is used, the default is 4000.
  • 36. Extinction of birds with stan_glm ∣ 36 library(rstanarm) bayes.lm <- stan_glm(logtime ~ size + status + nesting, prior=NULL, prior_intercep=NULL, prior_aux=NULL, seed=111, data=birdextinct)
  • 37. Extinction of birds with stan_glm ∣ 37 summary(bayes.lm) ## ## Model Info: ## function: stan_glm ## family: gaussian [identity] ## formula: logtime ~ size + status + nesting ## algorithm: sampling ## sample: 4000 (posterior sample size) ## priors: see help('prior_summary') ## observations: 62 ## predictors: 4 ## ## Estimates: ## mean sd 10% 50% 90% ## (Intercept) 0.4 0.2 0.2 0.4 0.7 ## sizelarge -0.6 0.2 -0.9 -0.7 -0.4 ## statusresident 0.5 0.2 0.3 0.5 0.7 ## nesting 0.3 0.0 0.2 0.3 0.3 ## sigma 0.7 0.1 0.6 0.7 0.8 ##
  • 38. Extinction of birds with stan_glm ∣ 38 library(bayesplot) mcmc_dens(bayes.lm, pars = c("sizelarge", "statusresident","nesting")) sizelarge statusresident nesting −1.25 −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 0.2 0.3 0.4
  • 39. Posterior credible intervals ∣ 39 plot(bayes.lm) sigma nesting statusresident sizelarge (Intercept) −1.0 −0.5 0.0 0.5
  • 40. Exercise ∣ 40 Repeat the analysis considering a normal prior for 𝛽. Help: https://mc-stan.org/rstanarm/articles/priors.html
  • 41. Bayesian Inference in Generalized Linear Models
  • 42. Introduction ∣ 42 • Generalized Linear Models (GLM) extend the linear regression model in order to accommodate: > non-normal responses, e.g. binomial data, frequency data, etc. > and transformation to linearity • Well known models are logistic regression, log-linear models for frequency tables, Poisson regression, Gamma regression, etc. • The conceptual advantage is that many data analytic problems with non-normal data are reduced to regression modelling.
  • 43. Introduction (II) ∣ 43 A Generalized linear model consists of: • A set of 𝑌1, … , 𝑌𝑛 random variables (response) independents and identically distributed inside the Exponential family (https://en.wikipedia.org/wiki/Exponential_family). • A set of explanatory variables 𝑋1, 𝑋2, … , 𝑋𝑝 that along with a parametric vector (𝛽0, 𝛽1, … , 𝛽𝑝) form the linear predictor: 𝜂𝑖 = 𝛽0 + 𝛽1𝑋1 + … + 𝛽𝑝𝑋𝑝, 𝑖 = 1, … , 𝑛 • A monotonic and differentiable function called link function 𝑔(), defining the relationship between the mean of the response 𝜇𝑖 = 𝐸(𝑌 𝑖) and the linear predictor: 𝑔(𝜇𝑖) = 𝜂𝑖 • Equivalently: 𝐸[𝑌 𝑖] = 𝑔−1 (𝛽0 + 𝛽1𝑋𝑖1 + … + 𝛽𝑝𝑋𝑖𝑝).
  • 44. Bayesian analysis in GLM’s ∣ 44 • Start with the corresponding likelihood which contains the available information about the parameters: the coefficients of the linear predictor. • For example, in the case of Poisson data with mean 𝜆𝑖 and logarithm link 𝑔(𝜆𝑖) = 𝑙𝑜𝑔(𝜆𝑖), the likelihood is: 𝑙(𝛽|y, x) = 𝑛 ∏ 𝑖=1 ( 1 𝑦𝑖! ) 𝑒𝑥𝑝 {𝑦𝑖x𝑡 𝑖𝛽 − 𝑒𝑥𝑝(x𝑡 𝑖𝛽)} • In frequentist statistics: maximum likelihood estimates are found using an iteratively reweighted least squares algorithm using either a Newton-Raphson method or the Fisher’s scoring method. • But, what about the Bayesian approach? Not much different conceptually.
  • 45. Why Bayesian GLM’s ∣ 45 • The Bayesian point of view is not a technique in the field of Statistics. • It is another way of understanding and performing Statistics. • And so, when our data bring us to Generalized Linear Models we can solve it using both Bayesian and Frequentist methods. • Bayesian statistical analysis has benefited from the explosion of cheap and powerful desktop computing over the last two decades: MCMC. • Bayesian techniques can now be applied to complex modeling problems where they could not have been applied previously. • Bayesian perspective will probably continue to challenge, and perhaps supplant, traditional frequentist statistical methods which have dominated many disciplines of science for a long time.
  • 46. Bayesian analysis in GLM’s ∣ 46 • Conceptually the Bayesian specification is “straightforward”. • Starting with the corresponding likelihood, we “only” need to assign a prior for regression coefficients. • But, how to do this assignment? Not an easy answer. • Easy option: to choose conjugate or non-informative independent priors. For instance normal or flat. • But, as usual in Bayesian modelling, there are no closed form solution available for the posterior distribution of parameters. • Here is where Numerical techniques come to rescue us by allowing us to obtain approximations of the posterior distribution.
  • 47. Software for Bayesian GLM’s ∣ 47 • To conduct the Bayesian GLM in R, we can use the package arm which contains the bayesglm function (Gelman et al., 2010). • We can also use the package MCMCpack also from R software which contains several functions to do so, like MCMClogit, MCMCPoisson or MCMCprobit. • Nevertheless, there are more ways to perform Bayesian MCMC analysis in general which can also be used for GLM’s. One of the most popular is BUGS (Bayesian Analysis using Gibbs Samplig) by Lunn et al. (2000). http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml • BUGS has two ways to run it, WinBUGS being the most popular, and OpenBUGS being the other option. • They can be used from R using the package R2WinBUGS or BRugs (the interface to the OpenBUGS).
  • 48. Software for Bayesian GLM’s (II) ∣ 48 • MCMC is not the only way to approximate posterior distributions. • As obtaining posteriors is equivalent to integrate, numerical integration is another way to do it. • Then, Laplace approximation can be used to numerically integrate. • In fact the Integrated Nested Laplace Approximation (INLA) due to Rue et al. (2009) has become a fast and powerful tool lately. • It can be easily connected with R: http://www.r-inla.org/
  • 49. Infant respiratory disease ∣ 49 We consider a study in which the probability of children developing bronchitis or pneumonia in their first year of life is studied by type of feeding and sex (it can be found in the library faraway). y<-rep(c(1,0,1,0,1,0,1,0,1,0,1,0),c(77,381,19,128,47,447, 48,336,16,111,31,433)) sexn<-factor(rep(c("boy","girl"),c(1099,975))) foodn<-factor(rep(c("Bottle","Suppl","Breast", "Bottle","Suppl","Breast"), c(458,147,494,384,127,464))) db<-data.frame(y,sexn,foodn); summary(db) ## y sexn foodn ## Min. :0.0000 boy :1099 Bottle:842 ## 1st Qu.:0.0000 girl: 975 Breast:958 ## Median :0.0000 Suppl :274 ## Mean :0.1148 ## 3rd Qu.:0.0000 ## Max. :1.0000
  • 50. Infant respiratory disease with bayesglm ∣ 50 The bayesglm function represents a kind of short cut of the Bayesian approach to inference. Typically, the posterior is not used directly for making inferences. Instead, an empirical distribution is constructed based on draws from the posterior and that empirical distribution is what informs the inference(s). library(arm) bm.1 <- bayesglm (y ~ sexn + foodn, family = binomial(link="logit"), prior.scale=Inf, prior.df=Inf) # just a test: this should be identical to classical logit # prior mean by default is 0 bm.2 <- bayesglm (y ~ sexn + foodn, family = binomial(link="logit")) # default Cauchy prior with scale 2.5 bm.3 <- bayesglm (y ~ sexn + foodn, family = binomial(link="logit"), prior.scale=2.5, prior.df=Inf)
  • 51. Infant respiratory disease with bayesglm ∣ 51 We can retrieve the posterior distribution of all 𝛽 parameters plot(density(coef(sim(bm.3))[,2]), main="", xlab="posterior beta for sex") −0.6 −0.4 −0.2 0.0 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 posterior beta for sex Density
  • 52. Infant respiratory disease with bayesglm ∣ 52 We can also retrieve the 95% credible interval for the coefficients apply(coef(sim(bm.3)),2, quantile,c(.025,.5,.975)) ## (Intercept) sexngirl foodnBreast foodnSuppl ## 2.5% -1.844987 -0.58824046 -0.9551455 -0.5728774 ## 50% -1.599999 -0.32135165 -0.6714483 -0.1620121 ## 97.5% -1.391229 -0.05089078 -0.4118225 0.1830197 Recall, in Bayesian Statistics this credible interval is interpreted as: there is a 95% probability that the true population value of the coefficient for girl is between -0.55 and -0.05.
  • 53. Infant respiratory disease with inla ∣ 53 Install inla from https://www.r-inla.org/download-install install.packages("INLA",repos=c(getOption("repos"), INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE) # upgrade the package inla.upgrade() library(INLA) ## Warning: package 'INLA' was built under R version 4.2.1 # INLA Method bm.inla1 <- inla(y ~ sexn + foodn, data=db, family = "binomial", control.compute = list(dic =
  • 54. Infant respiratory disease with inla ∣ 54 # INLA Method # Output Posterior Estimates round(bm.inla1$summary.fixed, 4) ## mean sd 0.025quant 0.5quant 0.975quant mode ## (Intercept) -1.6135 0.1124 -1.8379 -1.6121 -1.3967 NA ## sexngirl -0.3130 0.1410 -0.5914 -0.3123 -0.0378 NA ## foodnBreast -0.6700 0.1530 -0.9725 -0.6690 -0.3720 NA ## foodnSuppl -0.1728 0.2056 -0.5860 -0.1695 0.2208 NA
  • 55. Model comparison in Bayesian GLM’s ∣ 55 • Once the inference has been done, we have to find the best model by selecting among the possible variables to be included in the model. • One way (not the only one) to compare models in Bayesian statistics is using the DIC criterion. DIC is the counterpart of AIC or BIC to compare models. We can fit each model separately to calculate DIC or alternatively, all models can be simultaneously fitted. • As usual, we select the best model as the one with the lowest DIC value. bm.inla1$dic$dic ## [1] 1460.477
  • 56. Acknowledgement ∣ 56 Note: this document is based on material kindly provided by Professor David Conesa of the University of Valencia and the Valencia Bayesian Research Group (http://vabar.es/)
  • 57. Bibliography ∣ 57 • Albert, J. (2009). Bayesian Computation with R. Springer.