SlideShare a Scribd company logo
Multivariate analyses & decoding

Kay Henning Brodersen
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering, University of Zurich & ETH Zurich
Machine Learning and Pattern Recognition Group
Department of Computer Science, ETH Zurich
http://people.inf.ethz.ch/bkay
Why multivariate?

Univariate approaches are excellent for localizing activations in individual voxels.



                                     *

                                          n.s.




                          v1 v2                   v1 v2




                          reward                 no reward

                                                                                       2
Why multivariate?

Multivariate approaches can be used to examine responses that are jointly encoded
in multiple voxels.


                n.s.


                       n.s.



  v1 v2                        v1 v2        v2




 orange juice                 apple juice                     v1
                                                                                    3
Why multivariate?

Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths.


                                        signal
                                         𝑥2 (𝑡)
                      signal                         signal
                                                      𝑥3 (𝑡)                         observed BOLD signal
                       𝑥1 (𝑡)




                                       activity
                                        𝑧2 (𝑡)                                       hidden underlying
                     activity                      activity
                      𝑧1 (𝑡)                        𝑧3 (𝑡)                           neural activity and
                                                                                     coupling strengths

              driving input 𝑢1(𝑡)       modulatory input 𝑢2(𝑡)

                                                         t
                                 t

Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage

                                                                                                                                         4
Overview


1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses




                         5
Overview


1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses




                         6
Encoding vs. decoding




     condition                   encoding model
         stimulus                   𝑔: 𝑋 𝑡 → 𝑌𝑡
    response
      prediction error
                                    decoding model
                                     ℎ: 𝑌𝑡 → 𝑋 𝑡

context (cause or consequence)                       BOLD signal
           𝑋𝑡 ∈ ℝ 𝑑                                  𝑌𝑡 ∈ ℝ 𝑣




                                                                   7
Regression vs. classification


 Regression model

 independent
                            continuous
 variables              𝑓
                            dependent variable
 (regressors)



 Classification model

 independent                categorical
 variables              𝑓   dependent variable   vs.
 (features)                 (label)




                                                       8
Univariate vs. multivariate models

A univariate model considers a        A multivariate model considers
single voxel at a time.               many voxels at once.




  context           BOLD signal           context            BOLD signal
  𝑋𝑡 ∈ ℝ 𝑑            𝑌𝑡 ∈ ℝ              𝑋𝑡 ∈ ℝ 𝑑         𝑌𝑡 ∈ ℝ 𝑣 , v ≫ 1


Spatial dependencies between voxels   Multivariate models enable
are only introduced afterwards,       inferences on distributed responses
through random field theory.          without requiring focal activations.




                                                                              9
Prediction vs. inference

    The goal of prediction is to find                     The goal of inference is to decide
    a highly accurate encoding or                         between competing hypotheses.
    decoding function.




  predicting a cognitive      predicting a                comparing a model that         weighing the
      state using a         subject-specific             links distributed neuronal      evidence for
     brain-machine          diagnostic status               activity to a cognitive        sparse vs.
        interface                                          state with a model that    distributed coding
                                                                   does not

                predictive density                            marginal likelihood (model evidence)
𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝑋, 𝑌 = ∫ 𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝜃 𝑝 𝜃 𝑋, 𝑌 𝑑𝜃              𝑝 𝑋 𝑌 = ∫ 𝑝 𝑋 𝑌, 𝜃 𝑝 𝜃 𝑑𝜃


                                                                                                      10
Goodness of fit vs. complexity

Goodness of fit is the degree to which a model explains observed data.
Complexity is the flexibility of a model (including, but not limited to, its number of
parameters).


  𝑌                  1 parameter            4 parameters                    9 parameters




 truth
 data
 model                             𝑋
            underfitting               optimal                       overfitting



We wish to find the model that optimally trades off goodness of fit and complexity.

Bishop (2007) PRML


                                                                                           11
Summary of modelling terminology
                                                                        General Linear Model (GLM)
                                                                        • mass-univariate encoding model
                                                                        • to regress context onto brain activity
                                                                          and find clusters of similar effects

                                       Dynamic Causal Modelling (DCM)
                                       • multivariate encoding model
                                       • to evaluate connectivity
                                         hypotheses

                 Classification
                 • multivariate decoding model
                 • to predict a categorical context
                   label from brain activity

 Multivariate Bayes (MVB)
 • multivariate decoding model
 • to evaluate anatomical and
   coding hypotheses



                                                                                                                   12
Overview


1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses




                         13
Constructing a classifier

A principled way of designing a classifier would be to adopt a probabilistic approach:


              𝑌𝑡          𝑓        that 𝑘 which maximizes 𝑝 𝑋 𝑡 = 𝑘 𝑌𝑡 , 𝑋, 𝑌



In practice, classifiers differ in terms of how strictly they implement this principle.

 Generative classifiers           Discriminative classifiers       Discriminant classifiers

 use Bayes’ rule to estimate      estimate 𝑝 𝑋 𝑡 𝑌𝑡 directly       estimate 𝑓 𝑌𝑡 directly
  𝑝 𝑋 𝑡 𝑌𝑡 ∝ 𝑝 𝑌𝑡 𝑋 𝑡 𝑝 𝑋 𝑡       without Bayes’ theorem

 • Gaussian Naïve Bayes           • Logistic regression            • Fisher’s Linear
 • Linear Discriminant            • Relevance Vector                 Discriminant
   Analysis                         Machine                        • Support Vector Machine



                                                                                              14
Support vector machine (SVM)

     Linear SVM                                                       Nonlinear SVM

v2




                                                                 v1
     Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press

                                                                                      15
Stages in a classification analysis




                           Classification
       Feature                              Performance
                            using cross-
      extraction                             evaluation
                             validation
                                                 Bayesian mixed-
                                                 effects inference

                                                  𝑝 =1−
                                                  𝑃 𝜋 > 𝜋0 𝑘, 𝑛

                                                     mixed effects




                                                                     16
Feature extraction for trial-by-trial classification

We can obtain trial-wise estimates of neural activity by filtering the data with a GLM.


  data 𝑌                             design matrix 𝑋


                                                                    coefficients

                          Boxcar
                                                                        𝛽1
                                                                        𝛽2
                 =                                               ×              + 𝑒
                          regressor for
                          trial 2
                                                                        ⋮
                                                                        𝛽𝑝
                                                                        Estimate of this
                                                                        coefficient
                                                                        reflects activity
                                                                        on trial 2




                                                                                            17
Cross-validation

The generalization ability of a classifier can be estimated using a resampling procedure
known as cross-validation. One example is 2-fold cross-validation:


examples
       1               ?                                               training example
       2               ?                                           ? test examples
       3               ?
             ...
             ...

      99      ?
     100      ?
              1       2     folds




                                                                                           18
Cross-validation

A more commonly used variant is leave-one-out cross-validation.




examples
       1                                            ?                training example
       2                                    ?                     ? test example
       3                            ?

                           ...
            ...
            ...

                                   ...
                                   ...
                                   ...
      99              ?
     100      ?
              1      2             98      99     100    folds



                      performance evaluation


                                                                                        19
Performance evaluation

     Single-subject study with 𝒏 trials
  The most common approach is to assess how likely the obtained number of correctly
  classified trials could have occurred by chance.


                subject
                                          Binomial test
                                          𝑝 = 𝑃 𝑋 ≥ 𝑘 𝐻0 = 1 − 𝐵 𝑘|𝑛, 𝜋0
                                          In MATLAB:
                                          p = 1 - binocdf(k,n,pi_0)


      trial 1                             𝑘    number of correctly classified trials
                                 0        𝑛    total number of trials
trial 𝑛                          1        𝜋0   chance level (typically 0.5)
                                 1        𝐵    binomial cumulative density function
                                 1
                          - +-   0
                   +-


                                                                                       20
Performance evaluation


                                    population



                subject 1             subject 2   subject 3   subject 4       subject 𝑚


                                                                          …

      trial 1                   0        1           0           1                0
trial 𝑛                         1        1           1           1                1
                                1        0           1           1        …       1
                                1        1           0           1                1
                         - +-   0        1           0           1                0
                    +-


                                                                                          21
Performance evaluation

       Group study with 𝒎 subjects, 𝒏 trials each
In a group setting, we must account for both within-subjects (fixed-effects) and between-
subjects (random-effects) variance components.
                                                                                                     available for
                                                                                                    download soon
 Binomial test on               Binomial test on              t-test on                  Bayesian mixed-
 concatenated data              averaged data                 summary statistics         effects inference
  𝑝=1−                          𝑝=1−                                       𝜋−𝜋0          𝑝=1−
                                                               𝑡=      𝑚
                                  1          1                             𝜎 𝑚−1
  𝐵 ∑𝑘|∑𝑛, 𝜋0                   𝐵 𝑚 ∑𝑘|        ∑𝑛,    𝜋0                                 𝑃 𝜋 > 𝜋0 𝑘, 𝑛
                                             𝑚                 𝑝=1− 𝑡          𝑚−1   𝑡

        fixed effects                 fixed effects                 random effects           mixed effects



  𝜋     sample mean of sample accuracies                     𝜋0      chance level (typically 0.5)
  𝜎 𝑚−1 sample standard deviation                           𝑡 𝑚−1    cumulative Student’s 𝑡-distribution


Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (under review)

                                                                                                              22
Spatial deployment of informative regions

Which brain regions are jointly informative of a cognitive state of interest?

 Searchlight approach                               Whole-brain approach




 A sphere is passed across the brain. At each       A constrained classifier is trained on whole-
 location, the classifier is evaluated using only   brain data. Its voxel weights are related to
 the voxels in the current sphere → map of t-       their empirical null distributions using a
 scores.                                            permutation test → map of t-scores.

Nandy & Cordes (2003) MRM                           Mourao-Miranda et al. (2005) NeuroImage
Kriegeskorte et al. (2006) PNAS                     Lomakina et al. (in preparation)

                                                                                                    23
Summary: research questions for classification

  Overall classification accuracy                                       Spatial deployment of discriminative regions
       accuracy                                                                                  80%
        100 %


         50 %
                   Left or right        Truth        Healthy or
                     button?              or            ill?
                                         lie?
                                                                                                 55%
                                                classification task



  Temporal evolution of discriminability                                Model-based classification
        accuracy                                Participant indicates
                                                      decision
        100 %

                                                                                                       { group 1,
         50 %
                                                                                                       group 2 }
                                   Accuracy rises above
                                         chance
                                                within-trial time


Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection

                                                                                                                       24
Overview


1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses




                         25
Multivariate Bayes

SPM brings multivariate analyses into the conventional inference framework of
Bayesian hierarchical models and their inversion.




                                                              Mike West




                                                                                26
Multivariate Bayes

Multivariate analyses in SPM rest on the central tenet that inferences about how
the brain represents things reduce to model comparison.




    some cause or                                                  vs.
                          decoding model
     consequence



                                              sparse coding in           distributed coding in
                                            orbitofrontal cortex           prefrontal cortex




To make the ill-posed regression problem tractable, MVB uses a prior on voxel
weights. Different priors reflect different coding hypotheses.

                                                                                                 27
From encoding to decoding


 Encoding model: GLM           Decoding model: MVB

           𝑋                     𝛼           𝑋   = 𝐴𝛼


  𝛽        𝐴   = 𝑋𝛽                          𝐴

  𝛾                              𝛾
           𝑌   = 𝑇𝐴 + 𝐺𝛾 + 𝜀                 𝑌   = 𝑇𝐴 + 𝐺𝛾 + 𝜀
  𝜀                              𝜀

 In summary:                   In summary:
  𝑌 = 𝑇𝑋𝛽 + 𝐺𝛾 + 𝜀             𝑇𝑋 = 𝑌𝛼 − 𝐺𝛾𝛼 − 𝜀𝛼



                                                                 28
Specifying the prior for MVB

 1st level – spatial coding hypothesis 𝑈                                              ×   𝜂
                    𝑢   patterns




Voxel 2 is
allowed to play a
role. 𝑛
    voxels
                        𝑈          Voxel 3 is allowed to
                                   play a role, but only if
                                                              𝑈                   𝑈
                                   its neighbours play
                                   similar roles.




 2nd level – pattern covariance structure Σ
             𝑝 𝜂 = 𝒩 𝜂 0, Σ
             Σ = ∑𝑖 𝜆𝑖 𝑠           𝑖


Thus: 𝑝 𝛼|𝜆 = 𝒩 𝑛 𝛼 0, 𝑈Σ𝑈 𝑇                             and      𝑝 𝜆 = 𝒩 𝜆 𝜋, Π −1
                                                                                              29
Inverting the model

                          1                                                 Model inversion involves
Partition #1   subset 𝑠
                                                                            finding the posterior
                                                                            distribution over voxel
 Σ = 𝜆1 ×                                                                   weights 𝛼.
                                                                            In MVB, this includes a
                                                                            greedy search for the
                                                                            optimal covariance
                          1                      2
Partition #2   subset 𝑠               subset 𝑠                              structure that governs
                                                                            the prior over 𝛼.
 Σ = 𝜆1 ×                     +𝜆2 ×



                          1                      2                      3
Partition #3   subset 𝑠               subset 𝑠               subset 𝑠
(optimal)

 Σ = 𝜆1 ×                     +𝜆2 ×                  +𝜆3 ×



                                                                                                      30
Example: decoding motion from visual cortex
                                                               photic   motion   attention   const
MVB can be illustrated using SPM’s attention-
to-motion example dataset.

This dataset is based on a simple block
design. There are three experimental factors:
        photic         – display shows random dots
        motion         – dots are moving




                                                       scans
        attention – subjects asked to pay attention




Buechel & Friston 1999 Cerebral Cortex
Friston et al. 2008 NeuroImage


                                                                                                 31
Multivariate Bayes in SPM




Step 1
After having specified and estimated
a model, use the Results button.

                                       Step 2
                                       Select the contrast to be decoded.

                                                                            32
Multivariate Bayes in SPM
                            Step 3
                            Pick a region of interest.




                                                         33
Multivariate Bayes in SPM

                            Step 5
                            Here, the region of
                            interest is
                            specified as a
                            sphere around the
                            cursor. The spatial
                            prior implements
                            a sparse coding
                            hypothesis.



Step 4
Multivariate Bayes can be
invoked from within the
Multivariate section.




                                          34
Multivariate Bayes in SPM




Step 6
Results can be displayed using the BMS
button.



                                         35
Observations vs. predictions


                                 𝑹𝑿𝑐




                      (motion)




                                       36
Model evidence and voxel weights




                     log evidence = 3




                                        37
Using MVB for point classification


                                     MVB may outperform
                                     conventional point
                                     classifiers when using a
                                     more appropriate coding
                                     hypothesis.




                  Support Vector
                  Machine




                                                                38
Summary: research questions for MVB

 Where does the brain represent things?       How does the brain represent things?
 Evaluating competing anatomical hypotheses   Evaluating competing coding hypotheses




                                                                                       39
Overview


1 Introduction

2 Classification

3 Multivariate Bayes

4 Model-based analyses




                         40
Classification approaches by data representation

                                           Model-based
                                           classification
                                           How do patterns of
                                           hidden quantities (e.g.,
                                           connectivity among brain
                                           regions) differ between groups?

                         Activation-based
Structure-based          classification
classification
                         Which functional
Which anatomical         differences allow us to
structures allow us to   separate groups?
separate patients and
healthy controls?

                                                                         41
Generative embedding for model-based classification

                               step 1 —                                   A                step 2 —              A→B
                               modelling                                                   embedding             A→C
                                                                              B                                  B→B
                                                                      C                                          B→C

measurements from                                           subject-specific                            subject representation in
an individual subject                                      generative model                            model-based feature space


                                                                  1
         A                                                 accuracy
                                step 5 —                                               step 4 —                      step 3 —
                                interpretation                                         evaluation                    classification
                  B
     C
                                                                  0
 jointly discriminative                               discriminability of                           classification
connection strengths?                                      groups?                                     model

 Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage
 Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol


                                                                                                                                      42
Example: diagnosing stroke patients




                                      anatomical
                                      regions of interest




                     y = –26 mm




                                                            43
Example: diagnosing stroke patients




        PT                                       PT

                   HG                      HG
   L              (A1)                    (A1)        R


                    MGB              MGB

                         stimulus input



                                                          44
Multivariate analysis: connectional fingerprints




                                             patients
                                             controls

                                                        45
Dissecting diseases into physiologically distinct subgroups

                             Voxel-based contrast space                                                           Model-based parameter space

                      0.4           0.4                                              -0.15                     -0.15




                                                                              embedding
                                                                              generative
Voxel (64,-24,4) mm




                      0.3           0.3                                               -0.2                      -0.2




                                                                                                 L.HG → L.HG
                      0.2           0.2                                              -0.25                     -0.25

                      0.1           0.1                                                -0.3                     -0.3

                                                                                 patients
                        0             0                                            -0.35                       -0.35
                                                                                 controls
                                                                        -10                -10                                                 0.5          0.5
                      -0.1          -0.1                                               -0.4                     -0.4
                      -0.5          -0.5                      0                  0     -0.4                     -0.4                  0                0
                                     0              0                                                          -0.2       -0.2
                                                             Voxel (-56,-20,10) mm                                                                    R.HG → L.HG
Voxel (-42,-26,10) mm                            0.5    10        0.5   10                                                  0 -0.5         0   -0.5
                                                                                                                  L.MGB → L.MGB


                                   classification accuracy                                                                  classification accuracy
                        (using all voxels in the regions of interest)                                                  (using all 23 model parameters)
                                           75%                                                                                       98%
                                                                                                                                                             46
Discriminative features in model space




         PT                                      PT

                   HG                      HG
                  (A1)                    (A1)
   L                                                  R


                    MGB              MGB

                         stimulus input




                                                          47
Discriminative features in model space




         PT                                        PT

                   HG                      HG
                  (A1)                    (A1)
   L                                                         R


                    MGB              MGB

                         stimulus input          highly discriminative
                                                 somewhat discriminative
                                                 not discriminative

                                                                     48
Generative embedding and DCM

Question 1 – What do the data tell us about hidden processes in the brain?
 compute the posterior                                                                 ?
                 𝑝 𝑦 𝜃,𝑚 𝑝 𝜃 𝑚
    𝑝 𝜃 𝑦, 𝑚 =
                     𝑝 𝑦 𝑚




Question 2 – Which model is best w.r.t. the observed fMRI data?
 compute the model evidence
    𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚)
   = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃
                                                                                    ?

Question 3 – Which model is best w.r.t. an external criterion?
 compute the classification accuracy
                                                                                        { patient,
    𝑝 ℎ 𝑦 = 𝑥 𝑦                                                                         control }
   =     𝑝 ℎ 𝑦 = 𝑥 𝑦, 𝑦train , 𝑥train 𝑝 𝑦 𝑝 𝑦train   𝑝 𝑥train   𝑑𝑦 𝑑𝑦train 𝑥train




                                                                                                     49
Summary

          Classification
          • to assess whether a cognitive state is
            linked to patterns of activity
          • to assess the spatial deployment of
            discriminative activity

          Multivariate Bayes
          • to evaluate competing anatomical
            hypotheses
          • to evaluate competing coding hypotheses

          Model-based analyses
          • to assess whether groups differ in terms of
            patterns of connectivity
          • to generate new grouping hypotheses


                                                          50

More Related Content

PDF
PDF
Unification Of Randomized Anomaly In Deception Detection Using Fuzzy Logic Un...
PDF
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
PDF
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
PDF
CCIA'2008: On the dimensions of data complexity through synthetic data sets
PDF
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
PDF
Lecture4 - Machine Learning
PDF
Lecture6 - C4.5
Unification Of Randomized Anomaly In Deception Detection Using Fuzzy Logic Un...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
Lecture4 - Machine Learning
Lecture6 - C4.5

What's hot (20)

PDF
Lecture8 - From CBR to IBk
PDF
PDF
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
PDF
DOC
Chapter6.doc
PDF
Lecture7 - IBk
PDF
Making the Most of Predictive Models
PDF
Lecture15 - Advances topics on association rules PART II
PPT
24 poster
PDF
PDF
Predicting performance in Recommender Systems - Poster
PDF
Introduction to Machine Learning
PDF
Lecture3 - Machine Learning
PDF
PPTX
PDF
PPT
Genetic algorithms
PDF
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
PDF
DOC
97cogsci.doc
Lecture8 - From CBR to IBk
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
Chapter6.doc
Lecture7 - IBk
Making the Most of Predictive Models
Lecture15 - Advances topics on association rules PART II
24 poster
Predicting performance in Recommender Systems - Poster
Introduction to Machine Learning
Lecture3 - Machine Learning
Genetic algorithms
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
97cogsci.doc
Ad

Viewers also liked (18)

KEY
Reading the mind’s eye: Decoding object information during mental imagery fr...
PDF
CCSP Response Letter 6.1.06
PPS
People: Known And Unknown
PPTX
口コミマーケティングのための劣モジュラ関数の話
PPTX
Erasmus ip june_2013
PDF
"Playing and Teaching with Interactive Fiction" by Sherry Jones (April 12, 2015)
PPTX
INBOUND Bold Talks: David Meerman Scott
DOC
Zaragoza turismo-48
PDF
สมัครงาน
PDF
The State of the Union: Highlights
DOC
Zaragoza turismo 191
PDF
PLAY to win the product development race. SERIOUSLY (Donna Denio and Dieter R...
PPT
Science2.0 bcg10
PPS
Estatuas Pelo Mundo
PPTX
Strategies for IND Filing Success -CMC
PDF
Hope for Today Marketing
PPTX
Bloom's Taxonomy Revised - Posters by Schrock
PPTX
Tell Your Story Capabilities & Cases 2015
Reading the mind’s eye: Decoding object information during mental imagery fr...
CCSP Response Letter 6.1.06
People: Known And Unknown
口コミマーケティングのための劣モジュラ関数の話
Erasmus ip june_2013
"Playing and Teaching with Interactive Fiction" by Sherry Jones (April 12, 2015)
INBOUND Bold Talks: David Meerman Scott
Zaragoza turismo-48
สมัครงาน
The State of the Union: Highlights
Zaragoza turismo 191
PLAY to win the product development race. SERIOUSLY (Donna Denio and Dieter R...
Science2.0 bcg10
Estatuas Pelo Mundo
Strategies for IND Filing Success -CMC
Hope for Today Marketing
Bloom's Taxonomy Revised - Posters by Schrock
Tell Your Story Capabilities & Cases 2015
Ad

Similar to Multivariate analyses & decoding (20)

PDF
Machine learning for medical imaging data
PDF
An introduction to Machine Learning
PDF
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
PDF
Icml2012 tutorial representation_learning
PDF
material PREDICTIVE ANALYTICS UNIT I.pdf
KEY
CogSci 09 | Amsterdam
PPTX
FUNCTION APPROXIMATION
PPT
Machine learning
PDF
Python for brain mining: (neuro)science with state of the art machine learnin...
PDF
Support Vector Machines
PDF
Introduction to Machine Learning
PDF
An introduc on to Machine Learning
PDF
Paramètres X - Extension Long Term Dynamic Memory Effects
PDF
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
PPTX
BehavioMetrics: A Big Data Approach
PDF
Machine learning of structured outputs
PDF
Machine learning Lecture 1
PDF
Learning similarity functions from qualitative feedback
PDF
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
PDF
633-600 Machine Learning
Machine learning for medical imaging data
An introduction to Machine Learning
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
Icml2012 tutorial representation_learning
material PREDICTIVE ANALYTICS UNIT I.pdf
CogSci 09 | Amsterdam
FUNCTION APPROXIMATION
Machine learning
Python for brain mining: (neuro)science with state of the art machine learnin...
Support Vector Machines
Introduction to Machine Learning
An introduc on to Machine Learning
Paramètres X - Extension Long Term Dynamic Memory Effects
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
BehavioMetrics: A Big Data Approach
Machine learning of structured outputs
Machine learning Lecture 1
Learning similarity functions from qualitative feedback
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
633-600 Machine Learning

Multivariate analyses & decoding

  • 1. Multivariate analyses & decoding Kay Henning Brodersen Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich Machine Learning and Pattern Recognition Group Department of Computer Science, ETH Zurich http://people.inf.ethz.ch/bkay
  • 2. Why multivariate? Univariate approaches are excellent for localizing activations in individual voxels. * n.s. v1 v2 v1 v2 reward no reward 2
  • 3. Why multivariate? Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. n.s. n.s. v1 v2 v1 v2 v2 orange juice apple juice v1 3
  • 4. Why multivariate? Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths. signal 𝑥2 (𝑡) signal signal 𝑥3 (𝑡) observed BOLD signal 𝑥1 (𝑡) activity 𝑧2 (𝑡) hidden underlying activity activity 𝑧1 (𝑡) 𝑧3 (𝑡) neural activity and coupling strengths driving input 𝑢1(𝑡) modulatory input 𝑢2(𝑡) t t Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage 4
  • 5. Overview 1 Introduction 2 Classification 3 Multivariate Bayes 4 Model-based analyses 5
  • 6. Overview 1 Introduction 2 Classification 3 Multivariate Bayes 4 Model-based analyses 6
  • 7. Encoding vs. decoding condition encoding model stimulus 𝑔: 𝑋 𝑡 → 𝑌𝑡 response prediction error decoding model ℎ: 𝑌𝑡 → 𝑋 𝑡 context (cause or consequence) BOLD signal 𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑣 7
  • 8. Regression vs. classification Regression model independent continuous variables 𝑓 dependent variable (regressors) Classification model independent categorical variables 𝑓 dependent variable vs. (features) (label) 8
  • 9. Univariate vs. multivariate models A univariate model considers a A multivariate model considers single voxel at a time. many voxels at once. context BOLD signal context BOLD signal 𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑣 , v ≫ 1 Spatial dependencies between voxels Multivariate models enable are only introduced afterwards, inferences on distributed responses through random field theory. without requiring focal activations. 9
  • 10. Prediction vs. inference The goal of prediction is to find The goal of inference is to decide a highly accurate encoding or between competing hypotheses. decoding function. predicting a cognitive predicting a comparing a model that weighing the state using a subject-specific links distributed neuronal evidence for brain-machine diagnostic status activity to a cognitive sparse vs. interface state with a model that distributed coding does not predictive density marginal likelihood (model evidence) 𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝑋, 𝑌 = ∫ 𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝜃 𝑝 𝜃 𝑋, 𝑌 𝑑𝜃 𝑝 𝑋 𝑌 = ∫ 𝑝 𝑋 𝑌, 𝜃 𝑝 𝜃 𝑑𝜃 10
  • 11. Goodness of fit vs. complexity Goodness of fit is the degree to which a model explains observed data. Complexity is the flexibility of a model (including, but not limited to, its number of parameters). 𝑌 1 parameter 4 parameters 9 parameters truth data model 𝑋 underfitting optimal overfitting We wish to find the model that optimally trades off goodness of fit and complexity. Bishop (2007) PRML 11
  • 12. Summary of modelling terminology General Linear Model (GLM) • mass-univariate encoding model • to regress context onto brain activity and find clusters of similar effects Dynamic Causal Modelling (DCM) • multivariate encoding model • to evaluate connectivity hypotheses Classification • multivariate decoding model • to predict a categorical context label from brain activity Multivariate Bayes (MVB) • multivariate decoding model • to evaluate anatomical and coding hypotheses 12
  • 13. Overview 1 Introduction 2 Classification 3 Multivariate Bayes 4 Model-based analyses 13
  • 14. Constructing a classifier A principled way of designing a classifier would be to adopt a probabilistic approach: 𝑌𝑡 𝑓 that 𝑘 which maximizes 𝑝 𝑋 𝑡 = 𝑘 𝑌𝑡 , 𝑋, 𝑌 In practice, classifiers differ in terms of how strictly they implement this principle. Generative classifiers Discriminative classifiers Discriminant classifiers use Bayes’ rule to estimate estimate 𝑝 𝑋 𝑡 𝑌𝑡 directly estimate 𝑓 𝑌𝑡 directly 𝑝 𝑋 𝑡 𝑌𝑡 ∝ 𝑝 𝑌𝑡 𝑋 𝑡 𝑝 𝑋 𝑡 without Bayes’ theorem • Gaussian Naïve Bayes • Logistic regression • Fisher’s Linear • Linear Discriminant • Relevance Vector Discriminant Analysis Machine • Support Vector Machine 14
  • 15. Support vector machine (SVM) Linear SVM Nonlinear SVM v2 v1 Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press 15
  • 16. Stages in a classification analysis Classification Feature Performance using cross- extraction evaluation validation Bayesian mixed- effects inference 𝑝 =1− 𝑃 𝜋 > 𝜋0 𝑘, 𝑛 mixed effects 16
  • 17. Feature extraction for trial-by-trial classification We can obtain trial-wise estimates of neural activity by filtering the data with a GLM. data 𝑌 design matrix 𝑋 coefficients Boxcar 𝛽1 𝛽2 = × + 𝑒 regressor for trial 2 ⋮ 𝛽𝑝 Estimate of this coefficient reflects activity on trial 2 17
  • 18. Cross-validation The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: examples 1 ? training example 2 ? ? test examples 3 ? ... ... 99 ? 100 ? 1 2 folds 18
  • 19. Cross-validation A more commonly used variant is leave-one-out cross-validation. examples 1 ? training example 2 ? ? test example 3 ? ... ... ... ... ... ... 99 ? 100 ? 1 2 98 99 100 folds performance evaluation 19
  • 20. Performance evaluation Single-subject study with 𝒏 trials The most common approach is to assess how likely the obtained number of correctly classified trials could have occurred by chance. subject Binomial test 𝑝 = 𝑃 𝑋 ≥ 𝑘 𝐻0 = 1 − 𝐵 𝑘|𝑛, 𝜋0 In MATLAB: p = 1 - binocdf(k,n,pi_0) trial 1 𝑘 number of correctly classified trials 0 𝑛 total number of trials trial 𝑛 1 𝜋0 chance level (typically 0.5) 1 𝐵 binomial cumulative density function 1 - +- 0 +- 20
  • 21. Performance evaluation population subject 1 subject 2 subject 3 subject 4 subject 𝑚 … trial 1 0 1 0 1 0 trial 𝑛 1 1 1 1 1 1 0 1 1 … 1 1 1 0 1 1 - +- 0 1 0 1 0 +- 21
  • 22. Performance evaluation Group study with 𝒎 subjects, 𝒏 trials each In a group setting, we must account for both within-subjects (fixed-effects) and between- subjects (random-effects) variance components. available for download soon Binomial test on Binomial test on t-test on Bayesian mixed- concatenated data averaged data summary statistics effects inference 𝑝=1− 𝑝=1− 𝜋−𝜋0 𝑝=1− 𝑡= 𝑚 1 1 𝜎 𝑚−1 𝐵 ∑𝑘|∑𝑛, 𝜋0 𝐵 𝑚 ∑𝑘| ∑𝑛, 𝜋0 𝑃 𝜋 > 𝜋0 𝑘, 𝑛 𝑚 𝑝=1− 𝑡 𝑚−1 𝑡 fixed effects fixed effects random effects mixed effects 𝜋 sample mean of sample accuracies 𝜋0 chance level (typically 0.5) 𝜎 𝑚−1 sample standard deviation 𝑡 𝑚−1 cumulative Student’s 𝑡-distribution Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (under review) 22
  • 23. Spatial deployment of informative regions Which brain regions are jointly informative of a cognitive state of interest? Searchlight approach Whole-brain approach A sphere is passed across the brain. At each A constrained classifier is trained on whole- location, the classifier is evaluated using only brain data. Its voxel weights are related to the voxels in the current sphere → map of t- their empirical null distributions using a scores. permutation test → map of t-scores. Nandy & Cordes (2003) MRM Mourao-Miranda et al. (2005) NeuroImage Kriegeskorte et al. (2006) PNAS Lomakina et al. (in preparation) 23
  • 24. Summary: research questions for classification Overall classification accuracy Spatial deployment of discriminative regions accuracy 80% 100 % 50 % Left or right Truth Healthy or button? or ill? lie? 55% classification task Temporal evolution of discriminability Model-based classification accuracy Participant indicates decision 100 % { group 1, 50 % group 2 } Accuracy rises above chance within-trial time Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection 24
  • 25. Overview 1 Introduction 2 Classification 3 Multivariate Bayes 4 Model-based analyses 25
  • 26. Multivariate Bayes SPM brings multivariate analyses into the conventional inference framework of Bayesian hierarchical models and their inversion. Mike West 26
  • 27. Multivariate Bayes Multivariate analyses in SPM rest on the central tenet that inferences about how the brain represents things reduce to model comparison. some cause or vs. decoding model consequence sparse coding in distributed coding in orbitofrontal cortex prefrontal cortex To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different coding hypotheses. 27
  • 28. From encoding to decoding Encoding model: GLM Decoding model: MVB 𝑋 𝛼 𝑋 = 𝐴𝛼 𝛽 𝐴 = 𝑋𝛽 𝐴 𝛾 𝛾 𝑌 = 𝑇𝐴 + 𝐺𝛾 + 𝜀 𝑌 = 𝑇𝐴 + 𝐺𝛾 + 𝜀 𝜀 𝜀 In summary: In summary: 𝑌 = 𝑇𝑋𝛽 + 𝐺𝛾 + 𝜀 𝑇𝑋 = 𝑌𝛼 − 𝐺𝛾𝛼 − 𝜀𝛼 28
  • 29. Specifying the prior for MVB 1st level – spatial coding hypothesis 𝑈 × 𝜂 𝑢 patterns Voxel 2 is allowed to play a role. 𝑛 voxels 𝑈 Voxel 3 is allowed to play a role, but only if 𝑈 𝑈 its neighbours play similar roles. 2nd level – pattern covariance structure Σ 𝑝 𝜂 = 𝒩 𝜂 0, Σ Σ = ∑𝑖 𝜆𝑖 𝑠 𝑖 Thus: 𝑝 𝛼|𝜆 = 𝒩 𝑛 𝛼 0, 𝑈Σ𝑈 𝑇 and 𝑝 𝜆 = 𝒩 𝜆 𝜋, Π −1 29
  • 30. Inverting the model 1 Model inversion involves Partition #1 subset 𝑠 finding the posterior distribution over voxel Σ = 𝜆1 × weights 𝛼. In MVB, this includes a greedy search for the optimal covariance 1 2 Partition #2 subset 𝑠 subset 𝑠 structure that governs the prior over 𝛼. Σ = 𝜆1 × +𝜆2 × 1 2 3 Partition #3 subset 𝑠 subset 𝑠 subset 𝑠 (optimal) Σ = 𝜆1 × +𝜆2 × +𝜆3 × 30
  • 31. Example: decoding motion from visual cortex photic motion attention const MVB can be illustrated using SPM’s attention- to-motion example dataset. This dataset is based on a simple block design. There are three experimental factors:  photic – display shows random dots  motion – dots are moving scans  attention – subjects asked to pay attention Buechel & Friston 1999 Cerebral Cortex Friston et al. 2008 NeuroImage 31
  • 32. Multivariate Bayes in SPM Step 1 After having specified and estimated a model, use the Results button. Step 2 Select the contrast to be decoded. 32
  • 33. Multivariate Bayes in SPM Step 3 Pick a region of interest. 33
  • 34. Multivariate Bayes in SPM Step 5 Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis. Step 4 Multivariate Bayes can be invoked from within the Multivariate section. 34
  • 35. Multivariate Bayes in SPM Step 6 Results can be displayed using the BMS button. 35
  • 36. Observations vs. predictions 𝑹𝑿𝑐 (motion) 36
  • 37. Model evidence and voxel weights log evidence = 3 37
  • 38. Using MVB for point classification MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis. Support Vector Machine 38
  • 39. Summary: research questions for MVB Where does the brain represent things? How does the brain represent things? Evaluating competing anatomical hypotheses Evaluating competing coding hypotheses 39
  • 40. Overview 1 Introduction 2 Classification 3 Multivariate Bayes 4 Model-based analyses 40
  • 41. Classification approaches by data representation Model-based classification How do patterns of hidden quantities (e.g., connectivity among brain regions) differ between groups? Activation-based Structure-based classification classification Which functional Which anatomical differences allow us to structures allow us to separate groups? separate patients and healthy controls? 41
  • 42. Generative embedding for model-based classification step 1 — A step 2 — A→B modelling embedding A→C B B→B C B→C measurements from subject-specific subject representation in an individual subject generative model model-based feature space 1 A accuracy step 5 — step 4 — step 3 — interpretation evaluation classification B C 0 jointly discriminative discriminability of classification connection strengths? groups? model Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol 42
  • 43. Example: diagnosing stroke patients anatomical regions of interest y = –26 mm 43
  • 44. Example: diagnosing stroke patients PT PT HG HG L (A1) (A1) R MGB MGB stimulus input 44
  • 45. Multivariate analysis: connectional fingerprints patients controls 45
  • 46. Dissecting diseases into physiologically distinct subgroups Voxel-based contrast space Model-based parameter space 0.4 0.4 -0.15 -0.15 embedding generative Voxel (64,-24,4) mm 0.3 0.3 -0.2 -0.2 L.HG → L.HG 0.2 0.2 -0.25 -0.25 0.1 0.1 -0.3 -0.3 patients 0 0 -0.35 -0.35 controls -10 -10 0.5 0.5 -0.1 -0.1 -0.4 -0.4 -0.5 -0.5 0 0 -0.4 -0.4 0 0 0 0 -0.2 -0.2 Voxel (-56,-20,10) mm R.HG → L.HG Voxel (-42,-26,10) mm 0.5 10 0.5 10 0 -0.5 0 -0.5 L.MGB → L.MGB classification accuracy classification accuracy (using all voxels in the regions of interest) (using all 23 model parameters) 75% 98% 46
  • 47. Discriminative features in model space PT PT HG HG (A1) (A1) L R MGB MGB stimulus input 47
  • 48. Discriminative features in model space PT PT HG HG (A1) (A1) L R MGB MGB stimulus input highly discriminative somewhat discriminative not discriminative 48
  • 49. Generative embedding and DCM Question 1 – What do the data tell us about hidden processes in the brain?  compute the posterior ? 𝑝 𝑦 𝜃,𝑚 𝑝 𝜃 𝑚 𝑝 𝜃 𝑦, 𝑚 = 𝑝 𝑦 𝑚 Question 2 – Which model is best w.r.t. the observed fMRI data?  compute the model evidence 𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚) = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃 ? Question 3 – Which model is best w.r.t. an external criterion?  compute the classification accuracy { patient, 𝑝 ℎ 𝑦 = 𝑥 𝑦 control } = 𝑝 ℎ 𝑦 = 𝑥 𝑦, 𝑦train , 𝑥train 𝑝 𝑦 𝑝 𝑦train 𝑝 𝑥train 𝑑𝑦 𝑑𝑦train 𝑥train 49
  • 50. Summary Classification • to assess whether a cognitive state is linked to patterns of activity • to assess the spatial deployment of discriminative activity Multivariate Bayes • to evaluate competing anatomical hypotheses • to evaluate competing coding hypotheses Model-based analyses • to assess whether groups differ in terms of patterns of connectivity • to generate new grouping hypotheses 50