## Lazy ABC

I’ve just arxived a paper on “lazy ABC”. This post is a quick non-technical overview of the contents.

Approximate Bayesian computation, or ABC for short, is used for statistical inference of parameters for certain “intractable” applications. Likelihood-based inference (e.g. maximum likelihood and Bayesian methods) require calculating numerical values of the probability density of the data under the model for various parameter values. ABC can be used when this is not possible, but it is possible to simulate data given parameters. ABC simulates data for many parameter values and “accepts” those parameter values where there is a close match between simulated and observed data. Here is a simple version of ABC:

1. Simulate parameters $\theta^*$ from the prior density $\pi(\theta)$.
2. Simulate data $y^*$ from the model conditional on $\theta^*$.
3. Accept if $d(y^*,y_0) \leq \epsilon$.
4. Return to step 1 if fewer than N simulations have been performed.

Step 3 accepts if the simulated data $y^*$ is sufficiently close to the observed data $y_0$ based on a distance function $d(\cdot,\cdot)$ and a threshold $\epsilon$.

The accepted parameter values $\theta_1, \ldots, \theta_M$ can then be used for inference. In particular the posterior expectation of some function $h(\theta)$ has the Monte Carlo estimate

$M^{-1} \sum_{i=1}^M h(\theta_i).$

The results are approximations whose quality depends on choices including $\epsilon, N$ and $d$. For more details on ABC see the wikipedia page and the references there.

For particularly complicated models, the large numbers of simulations required by ABC can be slow. Lazy ABC tries to speed up the simulation stage by abandoning unpromising simulations early. This leads to a modified version of the above algorithm.

1. Simulate parameters $\theta^*$ from the prior density $\pi(\theta)$.
2. Conditional on $\theta^*$ perform a partial simulation to get $x^*$.
3. With probability $1-\alpha(\theta, x)$ stop early (go to step 6).
4. Complete the simulation to get $y^*$.
5. Accept if $d(y^*,y_0) \leq \epsilon$. If accepted, assign the weight $1/\alpha(\theta^*,x^*)$.
6. Return to step 1 if fewer than N simulations have been performed.

Lazy ABC gives weighted output $(\theta_1, w_1), \ldots, (\theta_M, w_M)$ which can be used to create estimates in the same way as importance sampling i.e. an estimate of the posterior expectation of $h(\theta)$ is

$\frac{\sum_1^M w_i h(\theta_i)}{\sum_1^M w_i}$.

The paper concentrates on the case when the $x$ variable in lazy ABC is a partial simulation of the dataset. For example if the data is state 1000 of a time series, then $x$ could be the first 100 states. Alternatively $x$ could be some other cheap source of information about the final dataset. For example several authors have recently looked at fitting a statistical model to some preliminary simulations and using this to produce an estimate of the likelihood. Lazy ABC will be somewhat slower than simply using this approximate model for inference, as further datasets must still be simulated, but has the advantage of not introducing approximation errors beyond those of standard ABC, as discussed next.

Theorem 1 proves the central property of lazy ABC. It targets the same distribution as standard ABC. That is, for a large number of iterations, the standard and lazy ABC estimates converge to the same quantities. This is true for any choice of $\alpha$ function (almost – a weak condition is given in the paper). However a poor choice of $\alpha$ will make convergence slow so that the algorithm is inefficient.

Theorem 2 gives the optimal choice of $\alpha$, which is

$\min(1, \lambda \left[ \frac{\gamma(\theta,x)}{\bar{T}_2(\theta,x)} \right]^{1/2}),$

using the following notation

$\gamma(\theta,x)$ is the probability that the simulation will be accepted in step 5 given $\theta$ and $x$
$\bar{T}_2(\theta, x)$ is the expected CPU time of steps 4 and 5 given $\theta$ and $x$
$\lambda$ is a positive value without a simple closed form expression

To use this in practice the paper suggests performing an initial pilot run of model simulations and using this to estimate $\gamma$ and $\bar{T}_2$. It’s difficult to estimate these based on $(\theta, x)$ as this is typically high dimensional. So low dimensional “decision statistics” $\phi(\theta, x)$ are introduced. The optimal $\alpha$ based on this is

$\min(1, \lambda \left[ \frac{\gamma(\phi)}{\bar{T}_2(\phi)} \right]^{1/2}),$

using similar notation to above. Once $\gamma$ and $\bar{T}_2$ have been estimated, $\lambda$ is chosen by numerical maximisation of an estimate of the algorithm’s efficiency based on the pilot run.

The paper illustrates this by an application to spatial extremes (taken from Erhardt and Smith 2012). In various simulation studies the increase in efficiency (roughly speaking, the reciprocal of the time saving) is 2-9 times.

The paper includes some extensions to this basic framework, including multiple stopping decisions and a version of ABC importance sampling. Hopefully it’s possible to extend the idea to MCMC and SMC algorithms as well.

Lazy ABC isn’t a magic bullet but hopefully adds a useful extra ABC tool. Two uses are illustrated in the paper. Firstly, one may wish to consider stopping all simulations early in the hope of achieving major time savings. The drawback is that it may be difficult to estimate the various quantities required for tuning, so some implementation effort is required. Alternatively one may consider stopping when a particular problematic event occurs – in the paper this is when a particularly slow simulation method needs to be used. Tuning is much simpler here as there is a binary decision statistic. This latter approach aims to avoid inefficient behavior of ABC (e.g. very slow simulation time on some parameter regions which turn out to contribute little) in a principled way.