WCSB 2012

1. Bayesian inference for the chemical master equation using approximate models Colin Gillespie Joint work with Andrew Golightly

2. Modelling Represent a biochemical network with a set of (pseudo-)biochemical reactions Specify the rate laws and rate constants of the reactions Run some stochastic or deterministic computer simulator of the system dynamics Straightforward using the Gillespie algorithm. The reverse problem is trickier – given time course data, and a set of reactions, can we recover the rate constants? 2/19

3. Modelling Generically, k species and r reactions with a typical reaction i c Ri : ui1 Y1 + . . . + uik Yk −→ vi1 Y1 + . . . + vik Yk − Stochastic rate constant: ci Hazard / instantaneous rate: hi (Y (t ), ci ) where Y (t ) = (Y1 (t ), . . . , Yk (t )) is the current state of the system and k Yj (t ) hi (Y (t ), ci ) = ci ∏ uij j =1 3/19

4. Modelling Some remarks: This setup describes a Markov jump process (MJP) The effect of reaction Ri is to change the value of each Yj by vij − uij It can be shown that the time to the next reaction is r τ ∼ Exp {h0 (Y (t ), c )} where h0 ( Y ( t ) , c ) = ∑ hi (Y (t ), ci ) i =1 and the reaction is of type i with probability hi (Y (t ), ci )/h0 (Y (t ), ci ) Hence, the process is easily simulated (and this technique is known as the Gillespie algorithm) 4/19

5. Inference for the exact model Aim: infer the ci given time-course biochemical data. Following Boys et al. (2008) and Wilkinson (2006): Suppose we observe the entire process Y over [0, T ] The i th unit interval contains ni reactions with times and types (tij , kij ), j = 1, 2, . . . , ni Hence, the likelihood for c is T −1 ni T π (Y|c ) = ∏ ∏ hk ij Y (ti ,j −1 ), ckij exp − 0 h0 (Y (t ), c ) dt i =0 j =1 5/19

6. Inference for the exact model Aim: infer the ci given time-course biochemical data. Following Boys et al. (2008) and Wilkinson (2006): Suppose we observe the entire process Y over [0, T ] The i th unit interval contains ni reactions with times and types (tij , kij ), j = 1, 2, . . . , ni Hence, the likelihood for c is T −1 ni T π (Y|c ) = ∏ ∏ hk ij Y (ti ,j −1 ), ckij exp − 0 h0 (Y (t ), c ) dt i =0 j =1 5/19

7. Inference for the exact model Problem: it is not feasible to observe all reaction times and types Assume data are observed on a regular grid with Y0:T = Y (t ) = (Y1 (t ), Y2 (t ), . . . , Yk (t )) : t = 0, 1, 2, . . . , T Idea: use a Gibbs sampler to alternate between draws of 1. times and types of reactions in (0, T ] conditional on c and the observations, 2. each ci conditional on the augmented data Note that step 1 can be performed for each interval (i , i + 1] in turn, due to the factorisation of π (Y|Y0:T , c ) 6/19

11. Difﬁculties These techniques do not scale well to problems of realistic size and complexity... True process is discrete and stochastic — stochasticity is vital — what about discreteness? Treating molecule numbers as continuous and performing exact inference for the resulting approximate model appears to be promising... 7/19

12. Bayesian ﬁltering Suppose we have noisy observations X0:(i −1) = {X (t ) : t = 0, 1, . . . , i − 1} where for example, X (t ) = Y (t ) + , ∼ N(0, Σ) Goal: generate a sample from π [c , Y (i )|X0:i ] given a new datum X (i ) π [c , Y (i )|X0:i ] ∝ π [c , Y (i − 1)|X0:i −1 ] πh Y(i −1,i ] |c π [Y (i )|X (i )] dY[i −1,i ) ∝ prior × likelihood where Y(i −1,i ] = {Y (t ) : t ∈ (i − 1, i ]} is the latent path in (i − 1, i ] and πh [·|c ] is the associated density under the simulator Idea: if we can sample π [c , Y (i − 1)|X0:i −1 ] then we can use MCMC to sample the target π [c , Y (i )|X0:i ] 8/19

15. Idealised MCMC scheme 1. Propose (c ∗ , Y (i − 1)∗ ) ∼ π [ · |X0:i −1 ] 2. Draw Y(∗−1,i ] ∼ πh [ · |c ∗ ] by forward simulation i 3. Accept/reject with probability π [Y (i )∗ |X (i )] min 1, . π [Y (i )|X (i )] Since the simulator is used as a proposal process, we don’t need to evaluate its associated density. Similar to Approximate Bayesian Computation (ABC) 9/19

16. A particle approach Since π [c , Y (i − 1)|X0:i −1 ] does not have analytic form (typically) we approximate this density with an equally weighted cloud of points or particles, hence the term particle ﬁlter The scheme is initialised with a sample from the prior and a posterior sample calculated at time i − 1 is used as the prior at time i To avoid sample impoverishment, in step 1 of the scheme we sample the kernel density estimate (KDE) of π [c ∗ , Y (i − 1)∗ |X0:i −1 ] by adding zero mean Normal noise to each proposed particle 10/19

17. Three types of approximation The distribution π [c , Y (i − 1)|X0:i −1 ] is represented as a discrete distribution - in theory as the number of particles N increases, the approximation improves We perturb particles - as N increases, perturbation decreases Simulator Exact: Gillespie Approximate: SDE, but can be perform poorly for low molecule numbers Approximate: ODE, but ignores stochasticity 11/19

20. Case study Wish to compare the hybrid scheme with inferences made assuming the “true” discrete, stochastic model To allow such inferences, consider a toy example: Autoregulatory network c1 c2 Immigration R1 : ∅ − → Y1 − R2 : ∅ − → Y2 − c3 c4 Death R 3 : Y1 − → ∅ − R4 : Y2 − → ∅ − c5 Interaction R5 : Y1 + Y2 − → 2Y2 . − Corrupt the data by constructing Poisson (Yi (t )) if Yi (t ) > 0, Xi (t )|Yi (t ) ∼ Bernoulli(0.1) if Yi (t ) = 0. for each i = 1, 2 so that Y (t ) is not observed anywhere 12/19

21. Stochastic vs deterministic c2 = 10 c2 = 1.0 1500 1000 500 Population 0 c2 = 0.1 c2 = 0.01 1500 1000 500 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Time 13/19

22. Stochastic vs deterministic ODE steady state Stochastic steady state 104 q q q q c2 = 10 × scale 103 q Scale q 10−5 q (Y2 imm’ rate) q 10−4 Y2 q q 10−3 q c3 = 0.01 × scale q q q 10−2 10−1 102 q (Y1 death rate) q q q q q q qq q qq qq q qq qq q 101 102 103 104 105 106 102 103 104 105 106 Y1 14/19

23. Autoregulatory network Initial conditions: 120 q Y1 (0) = Y2 (0) = 10 q q q q q q q Rate constants: c1 = 10, c2 = 0.1, q q Population Level 80 q q q q c3 = 0.1, c4 = 0.7 and c5 = 0.008. q q q Y1 : solid red line q 40 q Y2 : blue dashed lines q q Solid points: twenty one noised q q q q q q q 0 q q q q q q q q q q q q q q points used for inference 0 25 50 75 100 Time 15/19

24. Results: N = 30, 000 particles 0.5 30 30 0.4 25 25 20 20 0.3 Density Density Density 15 15 0.2 Vague priors 10 10 0.1 5 used 5 0.0 0 0 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 c1 c2 c3 Gillespie 8 400 SDE 6 300 ODE Density Density 4 200 2 100 0 0 0.0 0.5 1.0 1.5 2.0 0.000 0.015 0.030 0.045 c4 c5 16/19

25. Summary Inferring rate constants that govern discrete stochastic kinetic models is computationally challenging It appears promising to consider an approximation of the model and perform exact inference using the approximate model In this case study, the SDE and ODE models performed surprisingly well Although the ODE model had poor MCMC mixing properties 17/19

26. Future A hybrid forwards simulator (or indeed any forwards simulator) can be used as a proposal process inside a particle ﬁlter However, for this model the Salis & Kaznessis hybrid simulator is slower than the standard Gillespie algorithm A particle MCMC approach provides an exact inference procedure (see recent Andrieu et al JRSS B read paper) Preprint available using this PMCMC. This paper also considers the hybrid simulator and the linear noise approximation 18/19

27. Boys, R. J., Wilkinson, D.J. and T.B.L. Kirkwood (2008). Bayesian inference for a discretely observed stochastic kinetic model. Statistics and Computing 18. Gillespie, C. S. and Golightly, A. and (2010). Bayesian inference for generalized stochastic population growth models with application to aphids. JRSS Series C 59(2). Heron, E. A., Finkenstadt, B. and D. A. Hand (2007). Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study. Bioinformatics 23. Salis, H. and Y. Kaznessis (2005). Accurate hybrid simulation of a system of coupled biochemical reactions. Journal of Chemical Physics 122. Wilkinson, D. J. (2006). Stochastic Modelling for Systems Biology. Chapman & Hall/CRC Press. Contact details... email: colin.gillespie@ncl.ac.uk www: http://www.mas.ncl.ac.uk/~ncsg3/ 19/19

WCSB 2012

More Related Content

What's hot (18)

Viewers also liked (19)

Similar to WCSB 2012 (20)

Recently uploaded (20)

WCSB 2012