SlideShare a Scribd company logo
Considerate Approaches to ABC Model Selection




                                             Michael P.H. Stumpf, Christopher
                                           Barnes, Sarah Filippi, Thomas Thorne

                                                      Theoretical Systems Biology Group


                                                                 26/06/2012




 Considerate Approaches to ABC Model Selection   Stumpf et al.                            1 of 15
Evolving Networks




              (a) Duplication attachment (b) Duplication attachment
                                                   with complimentarity




                                                                       wj
            (c) Linear preferential
                                                                        wi
                                                        (d) General scale-free
            attachment
   Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   2 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .
Model Posterior

Pr(Mi |T, D)




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )
Pr(Mi |T, D)=         ν
                           Pr(D|Mj , T)π(Mj )
                    j =1

                                Evidence




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )                            For complicated models and/or
Pr(Mi |T, D)=         ν                                              detailed data the likelihood
                           Pr(D|Mj , T)π(Mj )                        evaluation can become
                    j =1                                             prohibitively expensive.
                                Evidence




     Considerate Approaches to ABC Model Selection   Stumpf et al.    Model Selection         3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )                            For complicated models and/or
Pr(Mi |T, D)=         ν                                              detailed data the likelihood
                           Pr(D|Mj , T)π(Mj )                        evaluation can become
                    j =1                                             prohibitively expensive.
                                Evidence

Approximate Inference
We can approximate the likelihood and/or the models. The “true”
model is unlikely to be in M anyway.

     Considerate Approaches to ABC Model Selection   Stumpf et al.    Model Selection         3 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt




      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt
But we can still simulate from the data-generating model, whence

                                  1(y = x )f (y |θi )π(θi )
                  p(θi |x ) =                               dy
                                X          p (x )
                                  1 (∆(y , x ) < ) f (y |θi )π(θi )
                              ≈                                     dy
                                X              p (x )




      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt
But we can still simulate from the data-generating model, whence

                                  1(y = x )f (y |θi )π(θi )
                  p(θi |x ) =                               dy
                                X          p (x )
                                  1 (∆(y , x ) < ) f (y |θi )π(θi )
                              ≈                                     dy
                                X              p (x )

Solutions for Complex Problems (?)
Approximate (i) data, (ii) model or (iii) distance.

      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

       pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                X




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

        pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                 X
Sufficient Statistics
This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we
have
                           p(x |s, θ) = p(x |s)




      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

        pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                 X
Sufficient Statistics
This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we
have
                           p(x |s, θ) = p(x |s)

Sufficency for Model Selection
If S (.) is sufficient for parameter estimation (in all models i
considered) it is not necessarily sufficient for model selection (Robert
et al., PNAS (2011)).
      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
           Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that
           σ2 = 1 is known).

                 mean                                var
                                   30



600
                                   25                                    Role of Summary Statistics
                                   20
                                                                                      Mean (sufficient) correctly
400                                15


                                   10
                                                                                           infers µ.
200
                                      5                                         Max/Min capture some
 0
      −4    −2    0      2    4
                                      0
                                           −4   −2    0    2   4
                                                                                        information on µ.
                 min                                 max

250
                                   300                                                   Var fails to capture any
200
                                   250
                                                                                             information on µ.
                                   200
150
                                   150

100
                                   100

 50                                   50


 0                                     0
      −4    −2    0      2    4            −4   −2    0    2   4
                                  θ


                      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics        5 of 15
ABC with Summary Statistics
           Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that
           σ2 = 1 is known).

                 mean                                var
                                   30



600
                                   25                                    Role of Summary Statistics
                                   20
                                                                                      Mean (sufficient) correctly
400                                15


                                   10
                                                                                           infers µ.
200
                                      5                                         Max/Min capture some
 0
      −4    −2    0      2    4
                                      0
                                           −4   −2    0    2   4
                                                                                        information on µ.
                 min                                 max

250
                                   300                                                   Var fails to capture any
200
                                   250
                                                                                             information on µ.
                                   200
150
                                   150

100
                                                                         We need a way of constructing
                                   100

 50                                   50
                                                                         sets of statistics that together are
 0                                     0                                 (approximately) sufficient.
      −4    −2    0      2    4            −4   −2    0    2   4
                                  θ


                      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics        5 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)
Information Theoretical Perspective
A summary statistic is an information compression device. Now let S
be a set of statistics which together are sufficient. Then the mutual
information
                                          p(θ, x )
         I (Θ; X ) =        p(θ, x ) log           d θdx = I (θ, S)
                       Ω X               p(θ)p(x )




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)
Information Theoretical Perspective
A summary statistic is an information compression device. Now let S
be a set of statistics which together are sufficient. Then the mutual
information
                                          p(θ, x )
         I (Θ; X ) =        p(θ, x ) log           d θdx = I (θ, S)
                       Ω X               p(θ)p(x )

Constructing Minimally Sufficient Summary Statistics
We seek the set U ⊆ S with minimal cardinality such that
I (Θ; S) = I (Θ; U).

     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
Constructing Sufficient Statistics


Proposition
Let X be a random variable generated according to f (·|θ). Let S be a
summary statistic and U and T two subsets of S such that U = U(X ),
T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have

                        I (Θ; S |T ) = I (Θ; S |U ) − I (Θ; T |U ) .

In order to construct a subset T of S such that I (Θ; S |T ) = 0, it is thus
sufficient to add statistics from S one by one until the condition holds.
If we denote by S(k ) the kth statistic to be added (with k   w) we have
S(k ) = S(k ) (X ), and then

            I (Θ; S |S(1) , . . . , S(k +1) )            I (Θ; S |S(1) , . . . , S(k ) ) .


     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   7 of 15
Constructing Sufficient Statistics


                                                                  p(θ, S(x )|U(x ))
I (Θ; S |U ) =               p(θ, S(x ), U(x )) log                                     dxd θ
                    Ω X                                        p(θ|U(x ))p(S(x )|U(x ))

              =         p(S(x )) [KL(p(Θ|S(x ))||p(Θ|U(x )))] dx
                    X
              = Ep(X ) [KL(p(Θ|S(X ))||p(Θ|U(X )))]



An Impossible Algorithm
• for all subsets u ∗ ⊆ s ∗ , perform ABC to obtain estimates p (Θ|u ∗ )
• determine the set
  A = {u ∗ ⊂ s∗ such that KL (p (Θ|s∗ )||p (Θ|u ∗ )) = 0},
• the desired subset is argminu ∗ ∈A |u ∗ |


     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics     7 of 15
Constructing Sufficient Statistics
 input: a sufficient set of statistics whose values on the dataset is s∗ =
 {s1 , . . . , sw }, a threshold δ
    ∗           ∗

 output: a subset v ∗ of s∗
 choose randomly u ∗ in s∗
 v ∗ ← u∗
 q ∗ ← s ∗ v ∗
 repeat
      repeat
            if q ∗ = Ø then return v ∗
            end if
            choose randomly u ∗ in q ∗
            q ∗ ← q ∗ u ∗
            perform ABC to obtain p (Θ|v ∗ , u ∗ )
      until KL (p (Θ|v ∗ , u ∗ )||p (Θ|v ∗ )) δ
      optionally: v ∗ ← OrderDependency (v ∗ , u ∗ )
      v ∗ ← v ∗ ∪ u∗
      q ∗ ← s ∗ v ∗
 until q ∗ = Ø
 return v ∗
    Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   7 of 15
Examples: Normal Distributions


                        y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 )
                                           1                          2


      100                                                           100




      80                                                            80




      60                                                            60
Run




                                                              Run
      40                                                            40




      20                                                            20




              mean      S2      range    max     random                     mean     S2      range      max   random




            Considerate Approaches to ABC Model Selection   Stumpf et al.      ABC Summary Statistics                  8 of 15
Examples: Normal Distributions


                               y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 )
                                                  1                          2
              6




                                                                                      q                                                                            q           q




                                                                                                             8
                                                                  q
                                                                                                                                                                 qqq
                                                                                                                                                                 qqq
                                                                                  q                                                                                q
                                                                                                                                                         q     qq
                                                                                                                                                          q
                                                           q                                                                                           qq q qq q
                                                                                                                                                             q
                                                                                                                                                       q q q q q




                                                                                                             6
              4




                                                            q             q                                                                                q
                                                                                                                                                      q qq
                                                                                                                                                      q qq       q
                                                   q                                                                                                       q
                                                                                                                                                 q    q qq q q
                                                                                                                                                       q q q q
                                                       q              q                                                                              qq q q
                                                                          q                                                                     q q q q q
                                                                                                                                                qq  q
                                                                                                                                                   qq q q
                                                                                                                                                 q qq
log(BF) ABC




                                                                                               log(BF) ABC
                                                                                                                                             q    q q




                                                                                                             4
                                                                                                                                               q
                                                            q                                                                          q          q                q
              2




                                           q        qq        q                                                                                q qq
                                                                                                                                                q             q
                                                      q       q                                                                              q q
                                                                                                                                            qq                 q           q
                                                     q     q qq
                                                            q q
                                      qq                 q          q                                                                        q
                                                        q q
                                                      q q qqqqqq    qq                                                             q




                                                                                                             2
                                  q             q q  q qq           qq
                                                            qqq q qq q
                                                            q                             q                                       q
                                                                                                                                           q
                              q                 q q qq q q q qq q q
                                                    q q q q qq q                                                                           q
                                               qq q q q q
                                                               qq                                                             q
                    q
              0




                                                      qq                                                                                                  q
                                               q                  qq                                                        q q                                            q
                                                                    q




                                                                                                             0
                                                                q
                                  q
                                                                q q                                                                    q
                                                                      q
              −2




                                                                                                             −2
                                           q                                                                      q



                        −2        0            2           4                  6           8                           −2     0             2          4                6       8

                                      log(BF) predicted                                                                           log(BF) predicted




                   Considerate Approaches to ABC Model Selection                              Stumpf et al.           ABC Summary Statistics                                       8 of 15
Examples: Population Genetics
Constant Population
       Size
      100




      80




      60
Run




      40




      20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.




                            Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics                 9 of 15
Examples: Population Genetics
Constant Population                                                               Exponential                                                               Two-Island Model
       Size                                                                    Population Growth                                                             with Migration
      100
                                                                             100                                                                      100




      80
                                                                             80                                                                       80




      60
                                                                             60                                                                       60
Run




                                                                       Run




                                                                                                                                                Run
      40
                                                                             40                                                                       40




      20
                                                                             20                                                                       20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11
                                                                                   S1   S2   S3   S4   S5    S6   S7   S8   S9   S10   S11                   S1   S2   S3   S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.




                            Considerate Approaches to ABC Model Selection                                   Stumpf et al.              ABC Summary Statistics                                                    9 of 15
Examples: Population Genetics
Constant Population                                                               Exponential                                                               Two-Island Model
       Size                                                                    Population Growth                                                             with Migration
      100
                                                                             100                                                                      100




      80
                                                                             80                                                                       80




      60
                                                                             60                                                                       60
Run




                                                                       Run




                                                                                                                                                Run
      40
                                                                             40                                                                       40




      20
                                                                             20                                                                       20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11
                                                                                   S1   S2   S3   S4   S5    S6   S7   S8   S9   S10   S11                   S1   S2   S3   S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.


            Summary Statistic Choice
            The choice of summary statistics appears to depend subtely on the
            true data-generating model. In light of coalescent processes this is,
            however, to be expected.

                            Considerate Approaches to ABC Model Selection                                   Stumpf et al.              ABC Summary Statistics                                                    9 of 15
Examples: Random Walks

       Classical Random                                    Persistent Random                                          Biased Random
             Walk                                                 Walk                                                     Walk
      100                                                  100                                                  100




      80                                                   80                                                   80




      60                                                   60                                                   60
Run




                                                     Run




                                                                                                          Run
      40                                                   40                                                   40




      20                                                   20                                                   20




                 S1    S2    S3    S4    S5                      S1   S2     S3    S4      S5                          S1   S2   S3   S4   S5




            [S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightness
            index; [S5] Eigenvalues of gyration tensor.


            Parameter Sufficiency for Complex Problems
            Here all statistics that have been chosen for parameter estimation are
            also chosen for model selection.


                      Considerate Approaches to ABC Model Selection        Stumpf et al.        ABC Summary Statistics                          9 of 15
Conditioning on Information


                      Θ



   s1                 s2                   s3




    Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   10 of 15
Conditioning on Information


                         Θ



     s1                  s2                   x

Statistics
  Sufficient: Implicates same area as
             full data.
   Ancillary: Implicates all values of θ
              equally.
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   10 of 15
Conditioning on Information

                                                                 What is the meaning of
                         Θ                                       p(θ|s0 , s1 , . . . , sn )?
                                                                 Let s = (s0 , s1 , . . . , sn ), and
                                                                 assume I (θ, s) < I (θ, x ) but
                                                                   → 0.
                                                                 This can happen for sufficient
                                                                 and ancillary s. In the latter
     s1                  s2                   x                  case we obtain

                                                                               p(θ|s) = π(θ).
Statistics
  Sufficient: Implicates same area as
             full data.
   Ancillary: Implicates all values of θ
              equally.
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC             10 of 15
Conditioning on Information

                                                                 What is the meaning of
                         Θ                                       p(θ|s0 , s1 , . . . , sn )?
                                                                 Let s = (s0 , s1 , . . . , sn ), and
                                                                 assume I (θ, s) < I (θ, x ) but
                                                                   → 0.
                                                                 This can happen for sufficient
                                                                 and ancillary s. In the latter
     s1                  s2                   x                  case we obtain

                                                                               p(θ|s) = π(θ).
Statistics
  Sufficient: Implicates same area as                               How about
             full data.
                                                                                          p(t |s)
   Ancillary: Implicates all values of θ
              equally.                                           if s is not (quite) sufficient?
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC             10 of 15
Model Selection vs. Model Checking

Model Selection: Several models M ∈ M are compared and one or
              more are chosen in light of the data: Find models which
              are better than others.
Model Checking: The quality of a model Mi is assessed against the
              available data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   11 of 15
Model Selection vs. Model Checking

Model Selection: Several models M ∈ M are compared and one or
              more are chosen in light of the data: Find models which
              are better than others.
Model Checking: The quality of a model Mi is assessed against the
              available data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].

Posterior Predictive Checks
We are interested in the posterior predictive distribution,

                   p(t (X )|s(X )) =                  p(t (X )|θ)p(θ|s(X ))d θ.
                                                Θ
In particular we have

                                p(s(X )|s(X )) = p(s(X )|X )
unless t (X ) is sufficient.

      Considerate Approaches to ABC Model Selection    Stumpf et al.   Interpreting ABC   11 of 15
ABC on Network Data




              (e) Duplication attachment (f) Duplication attachment
                                                   with complimentarity




                                                                       wj
            (g) Linear preferential                                     wi
                                                        (h) General scale-free
            attachment
   Considerate Approaches to ABC Model Selection   Stumpf et al.   Network Evolution   12 of 15
ABC on Network Data

Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
  evolution, but this does not allow us to
  calculate likelihoods for all but very
  trivial models.
• There is also no sufficient statistic that
  would allow us to summarize networks,
  so ABC approaches require some
  thought.
• Many possible summary statistics of
  networks are expensive to calculate.
    Full likelihood: Wiuf et al., PNAS (2006).
             ABC: Ratman et al., PLoS Comp.Biol. (2008).
     ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012).
                                                                             Stumpf & Wiuf, J. Roy. Soc. Interface (2010).

           Considerate Approaches to ABC Model Selection     Stumpf et al.     Network Evolution                             12 of 15
Spectral Distances
                   c                                                    a b c d e
                                                                                                
                                                                        0    1    1      1   0       a
                                                                                                
     a                            d           e             
                                                                       1    0    1      1   0   b
                                                                                                 
                                                         A =           1    1    0      0   0   c
                                                                                                
                                                            
                                                                       1    1    0      0   1   d
                                                                                                 
                   b                                                    0    0    0      1   0       e

Graph Spectra
Given a graph G with nodes N and edges (i , j ) ∈ E with i , j ∈ N, the
adjacency matrix, A, of the graph is defined by
                                               1      if (i , j ) ∈ E ,
                                  ai ,j =
                               0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining the
graph spectrum.
     Considerate Approaches to ABC Model Selection   Stumpf et al.   Network Evolution                   12 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j




     Considerate Approaches to ABC Model Selection     Stumpf et al.   Network Evolution   13 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )         Dh (A, B ) =                     (ai ,j − bh(i ),h(j ) )2 ,
                                                             i ,j




     Considerate Approaches to ABC Model Selection     Stumpf et al.      Network Evolution      13 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )         Dh (A, B ) =                     (ai ,j − bh(i ),h(j ) )2 ,
                                                             i ,j

Given a spectrum (which is relatively cheap to compute) we have

                                                               (α)          (β) 2
                           D (A, B ) =                      λl        − λl
                                                 l


     Considerate Approaches to ABC Model Selection     Stumpf et al.      Network Evolution      13 of 15
Protein Interaction Network Data
   Species                 Proteins       Interactions        Genome size           Sampling fraction
   S.cerevisiae                5035             22118                    6532             0.77
   D. melanogaster             7506             22871                  14076              0.53
   H. pylori                     715                1423                 1589             0.45
   E. coli                     1888                 7008                 5416             0.35




    Considerate Approaches to ABC Model Selection    Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                               Species                         Proteins        Interactions           Genome size           Sampling fraction
                               S.cerevisiae                       5035                     22118                 6532             0.77
                               D. melanogaster                    7506                     22871               14076              0.53
                               H. pylori                           715                      1423                 1589             0.45
                               E. coli                            1888                      7008                 5416             0.35



                    0.5




                    0.4
Model probability




                                                                       Organism
                    0.3                                                   S.cerevisae
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli

                    0.2




                    0.1




                    0.0


                          DA      DAC    LPA       SF   DACL    DACR
                                           Model

                               Considerate Approaches to ABC Model Selection                 Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                               Species                         Proteins        Interactions           Genome size           Sampling fraction
                               S.cerevisiae                       5035                     22118                 6532             0.77
                               D. melanogaster                    7506                     22871               14076              0.53
                               H. pylori                           715                      1423                 1589             0.45
                               E. coli                            1888                      7008                 5416             0.35



                    0.5
                                                                                                Model Selection
                                                                                                 • Inference here was based on all
                    0.4

                                                                                                   the data, not summary
Model probability




                    0.3
                                                                       Organism
                                                                          S.cerevisae              statistics.
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli                 • Duplication models receive the
                    0.2
                                                                                                   strongest support from the data.
                    0.1                                                                          • Several models receive support
                                                                                                   and no model is chosen
                    0.0
                                                                                                   unambiguously.
                          DA      DAC    LPA       SF   DACL    DACR
                                           Model

                               Considerate Approaches to ABC Model Selection                 Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                           δ                     α


          15




                                     8
                                     6
          10
   DA




                                     4
          5




                                     2
          0




                                     0
               0.0   0.4       0.8       0.0   0.4   0.8
                           δ                     α
          15




                                     8
                                     6
          10
   DAC




                                     4
          5




                                     2




                                                                                                                        S.cerevisiae
          0




                                     0




               0.0   0.4       0.8       0.0   0.4   0.8                                                                D. melanogaster
                           δ                     α                      p                              m
                                                                                                                        H. pylori




                                                                                     1.0
          10




                                                           10
                                     4




                                                                                     0.8
          8




                                                           8
                                                                                                                        E. coli
                                     3




                                                                                     0.6
          6




                                                           6
   DACL




                                     2




                                                                                     0.4
          4




                                                           4
                                     1




                                                                                     0.2
          2




                                                           2




                                                                                     0.0
          0




                                     0




                                                           0




               0.0   0.4       0.8       0.0   0.4   0.8        0.0   0.4   0.8            0   2   4       6   8   10
                           δ                     α                      p                              m


                                                                                     1.0
                                     4




                                                           5
          8




                                                                                     0.8
                                                           4
                                     3
          6




                                                                                     0.6
                                                           3
   DACR




                                     2
          4




                                                                                     0.4
                                                           2
                                     1
          2




                                                                                     0.2
                                                           1




                                                                                     0.0
          0




                                     0




                                                           0




               0.0   0.4       0.8       0.0   0.4   0.8        0.0   0.4   0.8            0   2   4       6   8   10




          Considerate Approaches to ABC Model Selection                     Stumpf et al.          Network Evolution                  14 of 15
Considerate Use of ABC

• ABC is a tool for situations where conventional statistical
  approaches fail or are too cumbersome.
• If all the data are used then this is (relatively) unproblematic; if the
  data are compressed/corrupted then caution is required.
• Some of the issues arising in ABC mirror those also encountered
  in “conventional” statistics:
          Any Bayesian inference uses the data only via the minimal
          sufficient statistic. This is because the calculation of the
          posterior distribution involves multiplying the likelihood by the
          prior and normalizing. Any factor of the likelihood that is a
          function of y alone will disappear after normalization.

  D. Cox (2006).
• In other cases it seems prudent to accept the additional (and
  considerable) computational cost of constructing suitable summary
  statistics (such as in Barnes et al., Stat&Comp 2012).
     Considerate Approaches to ABC Model Selection   Stumpf et al.   Conclusion   15 of 15
Acknowledgements




   Considerate Approaches to ABC Model Selection   Stumpf et al.   Conclusion   15 of 15

More Related Content

PDF
Approximate Bayesian model choice via random forests
PDF
Jsm09 talk
PDF
Boston talk
PDF
random forests for ABC model choice and parameter estimation
PDF
4th joint Warwick Oxford Statistics Seminar
PDF
better together? statistical learning in models made of modules
PDF
MaxEnt 2009 talk
PDF
Convergence of ABC methods
Approximate Bayesian model choice via random forests
Jsm09 talk
Boston talk
random forests for ABC model choice and parameter estimation
4th joint Warwick Oxford Statistics Seminar
better together? statistical learning in models made of modules
MaxEnt 2009 talk
Convergence of ABC methods

What's hot (20)

PDF
ABC short course: survey chapter
PDF
ABC-Gibbs
PDF
from model uncertainty to ABC
PDF
Laplace's Demon: seminar #1
PDF
ABC-Gibbs
PDF
ABC workshop: 17w5025
PDF
ABC-Gibbs
PDF
Yes III: Computational methods for model choice
PDF
the ABC of ABC
PDF
Coordinate sampler : A non-reversible Gibbs-like sampler
PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
asymptotics of ABC
PDF
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
PDF
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Intractable likelihoods
PDF
NBBC15, Reyjavik, June 08, 2015
PDF
Principle of Maximum Entropy
PDF
CISEA 2019: ABC consistency and convergence
PDF
A Maximum Entropy Approach to the Loss Data Aggregation Problem
ABC short course: survey chapter
ABC-Gibbs
from model uncertainty to ABC
Laplace's Demon: seminar #1
ABC-Gibbs
ABC workshop: 17w5025
ABC-Gibbs
Yes III: Computational methods for model choice
the ABC of ABC
Coordinate sampler : A non-reversible Gibbs-like sampler
Intro to Approximate Bayesian Computation (ABC)
asymptotics of ABC
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
EM algorithm and its application in probabilistic latent semantic analysis
Intractable likelihoods
NBBC15, Reyjavik, June 08, 2015
Principle of Maximum Entropy
CISEA 2019: ABC consistency and convergence
A Maximum Entropy Approach to the Loss Data Aggregation Problem
Ad

Viewers also liked (7)

PPTX
Developmental homeostasis
PDF
Intermediate Level - DNA and Jewish Genealogy
PDF
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
PPTX
Runs of Homozygosity presentation
PPTX
The complete genome sequence of a neanderthal article presentation
PPTX
Non-Mendellian genetics
PPT
Homeostasis
Developmental homeostasis
Intermediate Level - DNA and Jewish Genealogy
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
Runs of Homozygosity presentation
The complete genome sequence of a neanderthal article presentation
Non-Mendellian genetics
Homeostasis
Ad

Similar to Considerate Approaches to ABC Model Selection (20)

PDF
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
PDF
Colloquium in honor of Hans Ruedi Künsch
PDF
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
PDF
ABC in London, May 5, 2011
PDF
ABC with data cloning for MLE in state space models
PDF
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
PDF
ABC short course: model choice chapter
PDF
bayesian_statistics_introduction_uppsala_university
PDF
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Bayesian Deep Learning
PDF
ABC model choice
PDF
Discussion cabras-robert-130323171455-phpapp02
PDF
Approximating Bayes Factors
PDF
Workshop in honour of Don Poskitt and Gael Martin
PDF
Approximate Bayesian computation for the Ising/Potts model
PDF
Lecture_9.pdf
PDF
BIRS 12w5105 meeting
PDF
Intro to ABC
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
Colloquium in honor of Hans Ruedi Künsch
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
ABC in London, May 5, 2011
ABC with data cloning for MLE in state space models
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
ABC short course: model choice chapter
bayesian_statistics_introduction_uppsala_university
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Bayesian Deep Learning
ABC model choice
Discussion cabras-robert-130323171455-phpapp02
Approximating Bayes Factors
Workshop in honour of Don Poskitt and Gael Martin
Approximate Bayesian computation for the Ising/Potts model
Lecture_9.pdf
BIRS 12w5105 meeting
Intro to ABC

More from Michael Stumpf (8)

PDF
Gaining Confidence in Signalling and Regulatory Networks
PDF
Beyond Static Networks: A Bayesian Non-Parametric Approach
PDF
Are Powerlaws Useful?
PDF
Approximate Bayesian Computation on GPUs
PDF
Multi-Scale Models in Immunobiology
PDF
Time-Variable Networks in Candida Glabrata
PDF
Statistical analysis of network data and evolution on GPUs: High-performance ...
PDF
Noisy information transmission through molecular interaction networks
Gaining Confidence in Signalling and Regulatory Networks
Beyond Static Networks: A Bayesian Non-Parametric Approach
Are Powerlaws Useful?
Approximate Bayesian Computation on GPUs
Multi-Scale Models in Immunobiology
Time-Variable Networks in Candida Glabrata
Statistical analysis of network data and evolution on GPUs: High-performance ...
Noisy information transmission through molecular interaction networks

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Considerate Approaches to ABC Model Selection

  • 1. Considerate Approaches to ABC Model Selection Michael P.H. Stumpf, Christopher Barnes, Sarah Filippi, Thomas Thorne Theoretical Systems Biology Group 26/06/2012 Considerate Approaches to ABC Model Selection Stumpf et al. 1 of 15
  • 2. Evolving Networks (a) Duplication attachment (b) Duplication attachment with complimentarity wj (c) Linear preferential wi (d) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 2 of 15
  • 3. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 4. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Model Posterior Pr(Mi |T, D) Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 5. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) Pr(Mi |T, D)= ν Pr(D|Mj , T)π(Mj ) j =1 Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 6. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/or Pr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 7. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/or Pr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. Evidence Approximate Inference We can approximate the likelihood and/or the models. The “true” model is unlikely to be in M anyway. Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 8. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 9. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt But we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 10. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt But we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Solutions for Complex Problems (?) Approximate (i) data, (ii) model or (iii) distance. Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 11. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 12. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Sufficient Statistics This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we have p(x |s, θ) = p(x |s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 13. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Sufficient Statistics This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we have p(x |s, θ) = p(x |s) Sufficency for Model Selection If S (.) is sufficient for parameter estimation (in all models i considered) it is not necessarily sufficient for model selection (Robert et al., PNAS (2011)). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 14. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30 600 25 Role of Summary Statistics 20 Mean (sufficient) correctly 400 15 10 infers µ. 200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max 250 300 Var fails to capture any 200 250 information on µ. 200 150 150 100 100 50 50 0 0 −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 15. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30 600 25 Role of Summary Statistics 20 Mean (sufficient) correctly 400 15 10 infers µ. 200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max 250 300 Var fails to capture any 200 250 information on µ. 200 150 150 100 We need a way of constructing 100 50 50 sets of statistics that together are 0 0 (approximately) sufficient. −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 16. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 17. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Information Theoretical Perspective A summary statistic is an information compression device. Now let S be a set of statistics which together are sufficient. Then the mutual information p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x ) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 18. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Information Theoretical Perspective A summary statistic is an information compression device. Now let S be a set of statistics which together are sufficient. Then the mutual information p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x ) Constructing Minimally Sufficient Summary Statistics We seek the set U ⊆ S with minimal cardinality such that I (Θ; S) = I (Θ; U). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 19. Constructing Sufficient Statistics Proposition Let X be a random variable generated according to f (·|θ). Let S be a summary statistic and U and T two subsets of S such that U = U(X ), T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have I (Θ; S |T ) = I (Θ; S |U ) − I (Θ; T |U ) . In order to construct a subset T of S such that I (Θ; S |T ) = 0, it is thus sufficient to add statistics from S one by one until the condition holds. If we denote by S(k ) the kth statistic to be added (with k w) we have S(k ) = S(k ) (X ), and then I (Θ; S |S(1) , . . . , S(k +1) ) I (Θ; S |S(1) , . . . , S(k ) ) . Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 20. Constructing Sufficient Statistics p(θ, S(x )|U(x )) I (Θ; S |U ) = p(θ, S(x ), U(x )) log dxd θ Ω X p(θ|U(x ))p(S(x )|U(x )) = p(S(x )) [KL(p(Θ|S(x ))||p(Θ|U(x )))] dx X = Ep(X ) [KL(p(Θ|S(X ))||p(Θ|U(X )))] An Impossible Algorithm • for all subsets u ∗ ⊆ s ∗ , perform ABC to obtain estimates p (Θ|u ∗ ) • determine the set A = {u ∗ ⊂ s∗ such that KL (p (Θ|s∗ )||p (Θ|u ∗ )) = 0}, • the desired subset is argminu ∗ ∈A |u ∗ | Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 21. Constructing Sufficient Statistics input: a sufficient set of statistics whose values on the dataset is s∗ = {s1 , . . . , sw }, a threshold δ ∗ ∗ output: a subset v ∗ of s∗ choose randomly u ∗ in s∗ v ∗ ← u∗ q ∗ ← s ∗ v ∗ repeat repeat if q ∗ = Ø then return v ∗ end if choose randomly u ∗ in q ∗ q ∗ ← q ∗ u ∗ perform ABC to obtain p (Θ|v ∗ , u ∗ ) until KL (p (Θ|v ∗ , u ∗ )||p (Θ|v ∗ )) δ optionally: v ∗ ← OrderDependency (v ∗ , u ∗ ) v ∗ ← v ∗ ∪ u∗ q ∗ ← s ∗ v ∗ until q ∗ = Ø return v ∗ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 22. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 100 100 80 80 60 60 Run Run 40 40 20 20 mean S2 range max random mean S2 range max random Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  • 23. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 6 q q q 8 q qqq qqq q q q qq q q qq q qq q q q q q q q 6 4 q q q q qq q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q qq q q q qq log(BF) ABC log(BF) ABC q q q 4 q q q q q 2 q qq q q qq q q q q q q qq q q q q qq q q qq q q q q q q q qqqqqq qq q 2 q q q q qq qq qqq q qq q q q q q q q q qq q q q qq q q q q q q qq q q qq q q q q qq q q 0 qq q q qq q q q q 0 q q q q q q −2 −2 q q −2 0 2 4 6 8 −2 0 2 4 6 8 log(BF) predicted log(BF) predicted Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  • 24. Examples: Population Genetics Constant Population Size 100 80 60 Run 40 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 25. Examples: Population Genetics Constant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 26. Examples: Population Genetics Constant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Summary Statistic Choice The choice of summary statistics appears to depend subtely on the true data-generating model. In light of coalescent processes this is, however, to be expected. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 27. Examples: Random Walks Classical Random Persistent Random Biased Random Walk Walk Walk 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 [S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightness index; [S5] Eigenvalues of gyration tensor. Parameter Sufficiency for Complex Problems Here all statistics that have been chosen for parameter estimation are also chosen for model selection. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 28. Conditioning on Information Θ s1 s2 s3 Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 29. Conditioning on Information Θ s1 s2 x Statistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 30. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ). Statistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 31. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ). Statistics Sufficient: Implicates same area as How about full data. p(t |s) Ancillary: Implicates all values of θ equally. if s is not (quite) sufficient? Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 32. Model Selection vs. Model Checking Model Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others. Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’. Alternative Approach: ABCµ [Ratmann et al., PNAS]. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  • 33. Model Selection vs. Model Checking Model Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others. Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’. Alternative Approach: ABCµ [Ratmann et al., PNAS]. Posterior Predictive Checks We are interested in the posterior predictive distribution, p(t (X )|s(X )) = p(t (X )|θ)p(θ|s(X ))d θ. Θ In particular we have p(s(X )|s(X )) = p(s(X )|X ) unless t (X ) is sufficient. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  • 34. ABC on Network Data (e) Duplication attachment (f) Duplication attachment with complimentarity wj (g) Linear preferential wi (h) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 35. ABC on Network Data Summarizing Networks • Data are noisy and incomplete. • We can simulate models of network evolution, but this does not allow us to calculate likelihoods for all but very trivial models. • There is also no sufficient statistic that would allow us to summarize networks, so ABC approaches require some thought. • Many possible summary statistics of networks are expensive to calculate. Full likelihood: Wiuf et al., PNAS (2006). ABC: Ratman et al., PLoS Comp.Biol. (2008). ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012). Stumpf & Wiuf, J. Roy. Soc. Interface (2010). Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 36. Spectral Distances c a b c d e   0 1 1 1 0 a   a d e   1 0 1 1 0 b  A = 1 1 0 0 0 c     1 1 0 0 1 d  b 0 0 0 1 0 e Graph Spectra Given a graph G with nodes N and edges (i , j ) ∈ E with i , j ∈ N, the adjacency matrix, A, of the graph is defined by 1 if (i , j ) ∈ E , ai ,j = 0 otherwise. The eigenvalues, λ, of this matrix provide one way of defining the graph spectrum. Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 37. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 38. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 39. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Given a spectrum (which is relatively cheap to compute) we have (α) (β) 2 D (A, B ) = λl − λl l Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 40. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 41. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 0.4 Model probability Organism 0.3 S.cerevisae D.melanogaster H.pylori E.coli 0.2 0.1 0.0 DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 42. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 Model Selection • Inference here was based on all 0.4 the data, not summary Model probability 0.3 Organism S.cerevisae statistics. D.melanogaster H.pylori E.coli • Duplication models receive the 0.2 strongest support from the data. 0.1 • Several models receive support and no model is chosen 0.0 unambiguously. DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 43. Protein Interaction Network Data δ α 15 8 6 10 DA 4 5 2 0 0 0.0 0.4 0.8 0.0 0.4 0.8 δ α 15 8 6 10 DAC 4 5 2 S.cerevisiae 0 0 0.0 0.4 0.8 0.0 0.4 0.8 D. melanogaster δ α p m H. pylori 1.0 10 10 4 0.8 8 8 E. coli 3 0.6 6 6 DACL 2 0.4 4 4 1 0.2 2 2 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 δ α p m 1.0 4 5 8 0.8 4 3 6 0.6 3 DACR 2 4 0.4 2 1 2 0.2 1 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 44. Considerate Use of ABC • ABC is a tool for situations where conventional statistical approaches fail or are too cumbersome. • If all the data are used then this is (relatively) unproblematic; if the data are compressed/corrupted then caution is required. • Some of the issues arising in ABC mirror those also encountered in “conventional” statistics: Any Bayesian inference uses the data only via the minimal sufficient statistic. This is because the calculation of the posterior distribution involves multiplying the likelihood by the prior and normalizing. Any factor of the likelihood that is a function of y alone will disappear after normalization. D. Cox (2006). • In other cases it seems prudent to accept the additional (and considerable) computational cost of constructing suitable summary statistics (such as in Barnes et al., Stat&Comp 2012). Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15
  • 45. Acknowledgements Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15