Multivariate analyses & decoding

Multivariate analyses & decoding

Kay Henning Brodersen
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering, University of Zurich & ETH Zurich
Machine Learning and Pattern Recognition Group
Department of Computer Science, ETH Zurich
http://people.inf.ethz.ch/bkay

Why multivariate?

Univariate approaches are excellent for localizing activations in individual voxels.

*

n.s.

v1 v2 v1 v2

reward no reward

2

Why multivariate?

Multivariate approaches can be used to examine responses that are jointly encoded
in multiple voxels.

n.s.

n.s.

v1 v2 v1 v2 v2

orange juice apple juice v1
3

Why multivariate?

Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths.

signal
𝑥2 (𝑡)
signal signal
𝑥3 (𝑡) observed BOLD signal
𝑥1 (𝑡)

activity
𝑧2 (𝑡) hidden underlying
activity activity
𝑧1 (𝑡) 𝑧3 (𝑡) neural activity and
coupling strengths

driving input 𝑢1(𝑡) modulatory input 𝑢2(𝑡)

t
t

Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage

4

Overview

1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses

5

Overview

1 Introduction

2 Classification



6

Encoding vs. decoding

condition encoding model
stimulus 𝑔: 𝑋 𝑡 → 𝑌𝑡
response
prediction error
decoding model
ℎ: 𝑌𝑡 → 𝑋 𝑡

context (cause or consequence) BOLD signal
𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑣

7

Regression vs. classification

Regression model

independent
continuous
variables 𝑓
dependent variable
(regressors)

Classification model

independent categorical
variables 𝑓 dependent variable vs.
(features) (label)

8

Univariate vs. multivariate models

A univariate model considers a A multivariate model considers
single voxel at a time. many voxels at once.

context BOLD signal context BOLD signal
𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑣 , v ≫ 1

Spatial dependencies between voxels Multivariate models enable
are only introduced afterwards, inferences on distributed responses
through random field theory. without requiring focal activations.

9

Prediction vs. inference

The goal of prediction is to find The goal of inference is to decide
a highly accurate encoding or between competing hypotheses.
decoding function.

predicting a cognitive predicting a comparing a model that weighing the
state using a subject-specific links distributed neuronal evidence for
brain-machine diagnostic status activity to a cognitive sparse vs.
interface state with a model that distributed coding
does not

predictive density marginal likelihood (model evidence)
𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝑋, 𝑌 = ∫ 𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝜃 𝑝 𝜃 𝑋, 𝑌 𝑑𝜃 𝑝 𝑋 𝑌 = ∫ 𝑝 𝑋 𝑌, 𝜃 𝑝 𝜃 𝑑𝜃

10

Goodness of fit vs. complexity

Goodness of fit is the degree to which a model explains observed data.
Complexity is the flexibility of a model (including, but not limited to, its number of
parameters).

𝑌 1 parameter 4 parameters 9 parameters

truth
data
model 𝑋
underfitting optimal overfitting

We wish to find the model that optimally trades off goodness of fit and complexity.

Bishop (2007) PRML

11

Summary of modelling terminology
General Linear Model (GLM)
• mass-univariate encoding model
• to regress context onto brain activity
and find clusters of similar effects

Dynamic Causal Modelling (DCM)
• multivariate encoding model
• to evaluate connectivity
hypotheses

Classification
• multivariate decoding model
• to predict a categorical context
label from brain activity

Multivariate Bayes (MVB)
• multivariate decoding model
• to evaluate anatomical and
coding hypotheses

12

Overview

1 Introduction

2 Classification



13

Constructing a classifier

A principled way of designing a classifier would be to adopt a probabilistic approach:

𝑌𝑡 𝑓 that 𝑘 which maximizes 𝑝 𝑋 𝑡 = 𝑘 𝑌𝑡 , 𝑋, 𝑌

In practice, classifiers differ in terms of how strictly they implement this principle.

Generative classifiers Discriminative classifiers Discriminant classifiers

use Bayes’ rule to estimate estimate 𝑝 𝑋 𝑡 𝑌𝑡 directly estimate 𝑓 𝑌𝑡 directly
𝑝 𝑋 𝑡 𝑌𝑡 ∝ 𝑝 𝑌𝑡 𝑋 𝑡 𝑝 𝑋 𝑡 without Bayes’ theorem

• Gaussian Naïve Bayes • Logistic regression • Fisher’s Linear
• Linear Discriminant • Relevance Vector Discriminant
Analysis Machine • Support Vector Machine

14

Support vector machine (SVM)

Linear SVM Nonlinear SVM

v2

v1
Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press

15

Stages in a classification analysis

Classification
Feature Performance
using cross-
extraction evaluation
validation
Bayesian mixed-
effects inference

𝑝 =1−
𝑃 𝜋 > 𝜋0 𝑘, 𝑛

mixed effects

16

Feature extraction for trial-by-trial classification

We can obtain trial-wise estimates of neural activity by filtering the data with a GLM.

data 𝑌 design matrix 𝑋

coefficients

Boxcar
𝛽1
𝛽2
= × + 𝑒
regressor for
trial 2
⋮
𝛽𝑝
Estimate of this
coefficient
reflects activity
on trial 2

17

Cross-validation

The generalization ability of a classifier can be estimated using a resampling procedure
known as cross-validation. One example is 2-fold cross-validation:

examples
1 ? training example
2 ? ? test examples
3 ?
...
...

99 ?
100 ?
1 2 folds

18

Cross-validation

A more commonly used variant is leave-one-out cross-validation.

examples
1 ? training example
2 ? ? test example
3 ?

...
...
...

...
...
...
99 ?
100 ?
1 2 98 99 100 folds

performance evaluation

19

Performance evaluation

Single-subject study with 𝒏 trials
The most common approach is to assess how likely the obtained number of correctly
classified trials could have occurred by chance.

subject
Binomial test
𝑝 = 𝑃 𝑋 ≥ 𝑘 𝐻0 = 1 − 𝐵 𝑘|𝑛, 𝜋0
In MATLAB:
p = 1 - binocdf(k,n,pi_0)

trial 1 𝑘 number of correctly classified trials
0 𝑛 total number of trials
trial 𝑛 1 𝜋0 chance level (typically 0.5)
1 𝐵 binomial cumulative density function
1
- +- 0
+-

20


population

subject 1 subject 2 subject 3 subject 4 subject 𝑚

…

trial 1 0 1 0 1 0
trial 𝑛 1 1 1 1 1
1 0 1 1 … 1
1 1 0 1 1
- +- 0 1 0 1 0
+-

21


Group study with 𝒎 subjects, 𝒏 trials each
In a group setting, we must account for both within-subjects (fixed-effects) and between-
subjects (random-effects) variance components.
available for
download soon
Binomial test on Binomial test on t-test on Bayesian mixed-
concatenated data averaged data summary statistics effects inference
𝑝=1− 𝑝=1− 𝜋−𝜋0 𝑝=1−
𝑡= 𝑚
1 1 𝜎 𝑚−1
𝐵 ∑𝑘|∑𝑛, 𝜋0 𝐵 𝑚 ∑𝑘| ∑𝑛, 𝜋0 𝑃 𝜋 > 𝜋0 𝑘, 𝑛
𝑚 𝑝=1− 𝑡 𝑚−1 𝑡

fixed effects fixed effects random effects mixed effects

𝜋 sample mean of sample accuracies 𝜋0 chance level (typically 0.5)
𝜎 𝑚−1 sample standard deviation 𝑡 𝑚−1 cumulative Student’s 𝑡-distribution

Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (under review)

22

Spatial deployment of informative regions

Which brain regions are jointly informative of a cognitive state of interest?

Searchlight approach Whole-brain approach

A sphere is passed across the brain. At each A constrained classifier is trained on whole-
location, the classifier is evaluated using only brain data. Its voxel weights are related to
the voxels in the current sphere → map of t- their empirical null distributions using a
scores. permutation test → map of t-scores.

Nandy & Cordes (2003) MRM Mourao-Miranda et al. (2005) NeuroImage
Kriegeskorte et al. (2006) PNAS Lomakina et al. (in preparation)

23

Summary: research questions for classification

Overall classification accuracy Spatial deployment of discriminative regions
accuracy 80%
100 %

50 %
Left or right Truth Healthy or
button? or ill?
lie?
55%
classification task

Temporal evolution of discriminability Model-based classification
accuracy Participant indicates
decision
100 %

{ group 1,
50 %
group 2 }
Accuracy rises above
chance
within-trial time

Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection

24

Overview

1 Introduction

2 Classification



25

Multivariate Bayes

SPM brings multivariate analyses into the conventional inference framework of
Bayesian hierarchical models and their inversion.

Mike West

26

Multivariate Bayes

Multivariate analyses in SPM rest on the central tenet that inferences about how
the brain represents things reduce to model comparison.

some cause or vs.
decoding model
consequence

sparse coding in distributed coding in
orbitofrontal cortex prefrontal cortex

To make the ill-posed regression problem tractable, MVB uses a prior on voxel
weights. Different priors reflect different coding hypotheses.

27

From encoding to decoding

Encoding model: GLM Decoding model: MVB

𝑋 𝛼 𝑋 = 𝐴𝛼

𝛽 𝐴 = 𝑋𝛽 𝐴

𝛾 𝛾
𝑌 = 𝑇𝐴 + 𝐺𝛾 + 𝜀 𝑌 = 𝑇𝐴 + 𝐺𝛾 + 𝜀
𝜀 𝜀

In summary: In summary:
𝑌 = 𝑇𝑋𝛽 + 𝐺𝛾 + 𝜀 𝑇𝑋 = 𝑌𝛼 − 𝐺𝛾𝛼 − 𝜀𝛼

28

Specifying the prior for MVB

1st level – spatial coding hypothesis 𝑈 × 𝜂
𝑢 patterns

Voxel 2 is
allowed to play a
role. 𝑛
voxels
𝑈 Voxel 3 is allowed to
play a role, but only if
𝑈 𝑈
its neighbours play
similar roles.

2nd level – pattern covariance structure Σ
𝑝 𝜂 = 𝒩 𝜂 0, Σ
Σ = ∑𝑖 𝜆𝑖 𝑠 𝑖

Thus: 𝑝 𝛼|𝜆 = 𝒩 𝑛 𝛼 0, 𝑈Σ𝑈 𝑇 and 𝑝 𝜆 = 𝒩 𝜆 𝜋, Π −1
29

Inverting the model

1 Model inversion involves
Partition #1 subset 𝑠
finding the posterior
distribution over voxel
Σ = 𝜆1 × weights 𝛼.
In MVB, this includes a
greedy search for the
optimal covariance
1 2
Partition #2 subset 𝑠 subset 𝑠 structure that governs
the prior over 𝛼.
Σ = 𝜆1 × +𝜆2 ×

1 2 3
Partition #3 subset 𝑠 subset 𝑠 subset 𝑠
(optimal)

Σ = 𝜆1 × +𝜆2 × +𝜆3 ×

30

Example: decoding motion from visual cortex
photic motion attention const
MVB can be illustrated using SPM’s attention-
to-motion example dataset.

This dataset is based on a simple block
design. There are three experimental factors:
 photic – display shows random dots
 motion – dots are moving

scans
 attention – subjects asked to pay attention

Buechel & Friston 1999 Cerebral Cortex
Friston et al. 2008 NeuroImage

31

Multivariate Bayes in SPM

Step 1
After having specified and estimated
a model, use the Results button.

Step 2
Select the contrast to be decoded.

32

Step 3
Pick a region of interest.

33


Step 5
Here, the region of
interest is
specified as a
sphere around the
cursor. The spatial
prior implements
a sparse coding
hypothesis.

Step 4
Multivariate Bayes can be
invoked from within the
Multivariate section.

34


Step 6
Results can be displayed using the BMS
button.

35

Observations vs. predictions

𝑹𝑿𝑐

(motion)

36

Model evidence and voxel weights

log evidence = 3

37

Using MVB for point classification

MVB may outperform
conventional point
classifiers when using a
more appropriate coding
hypothesis.

Support Vector
Machine

38

Summary: research questions for MVB

Where does the brain represent things? How does the brain represent things?
Evaluating competing anatomical hypotheses Evaluating competing coding hypotheses

39

Overview

1 Introduction

2 Classification



40

Classification approaches by data representation

Model-based
classification
How do patterns of
hidden quantities (e.g.,
connectivity among brain
regions) differ between groups?

Activation-based
Structure-based classification
classification
Which functional
Which anatomical differences allow us to
structures allow us to separate groups?
separate patients and
healthy controls?

41

Generative embedding for model-based classification

step 1 — A step 2 — A→B
modelling embedding A→C
B B→B
C B→C

measurements from subject-specific subject representation in
an individual subject generative model model-based feature space

1
A accuracy
step 5 — step 4 — step 3 —
interpretation evaluation classification
B
C
0
jointly discriminative discriminability of classification
connection strengths? groups? model

Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage
Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol

42

Example: diagnosing stroke patients

anatomical
regions of interest

y = –26 mm

43

Example: diagnosing stroke patients

PT PT

HG HG
L (A1) (A1) R

MGB MGB

stimulus input

44

Multivariate analysis: connectional fingerprints

patients
controls

45

Dissecting diseases into physiologically distinct subgroups

Voxel-based contrast space Model-based parameter space

0.4 0.4 -0.15 -0.15

embedding
generative
Voxel (64,-24,4) mm

0.3 0.3 -0.2 -0.2

L.HG → L.HG
0.2 0.2 -0.25 -0.25

0.1 0.1 -0.3 -0.3

patients
0 0 -0.35 -0.35
controls
-10 -10 0.5 0.5
-0.1 -0.1 -0.4 -0.4
-0.5 -0.5 0 0 -0.4 -0.4 0 0
0 0 -0.2 -0.2
Voxel (-56,-20,10) mm R.HG → L.HG
Voxel (-42,-26,10) mm 0.5 10 0.5 10 0 -0.5 0 -0.5
L.MGB → L.MGB

classification accuracy classification accuracy
(using all voxels in the regions of interest) (using all 23 model parameters)
75% 98%
46

Discriminative features in model space

PT PT

HG HG
(A1) (A1)
L R

MGB MGB

stimulus input

47

Discriminative features in model space

PT PT

HG HG
(A1) (A1)
L R

MGB MGB

stimulus input highly discriminative
somewhat discriminative
not discriminative

48

Generative embedding and DCM

Question 1 – What do the data tell us about hidden processes in the brain?
 compute the posterior ?
𝑝 𝑦 𝜃,𝑚 𝑝 𝜃 𝑚
𝑝 𝜃 𝑦, 𝑚 =
𝑝 𝑦 𝑚

Question 2 – Which model is best w.r.t. the observed fMRI data?
 compute the model evidence
𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚)
= ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃
?

Question 3 – Which model is best w.r.t. an external criterion?
 compute the classification accuracy
{ patient,
𝑝 ℎ 𝑦 = 𝑥 𝑦 control }
= 𝑝 ℎ 𝑦 = 𝑥 𝑦, 𝑦train , 𝑥train 𝑝 𝑦 𝑝 𝑦train 𝑝 𝑥train 𝑑𝑦 𝑑𝑦train 𝑥train

49

Summary

Classification
• to assess whether a cognitive state is
linked to patterns of activity
• to assess the spatial deployment of
discriminative activity

Multivariate Bayes
• to evaluate competing anatomical
hypotheses
• to evaluate competing coding hypotheses

Model-based analyses
• to assess whether groups differ in terms of
patterns of connectivity
• to generate new grouping hypotheses

50

Multivariate analyses & decoding

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Multivariate analyses & decoding (20)