## Lazy ABC

June 2, 2014

I’ve just arxived a paper on “lazy ABC”. This post is a quick non-technical overview of the contents.

Approximate Bayesian computation, or ABC for short, is used for statistical inference of parameters for certain “intractable” applications. Likelihood-based inference (e.g. maximum likelihood and Bayesian methods) require calculating numerical values of the probability density of the data under the model for various parameter values. ABC can be used when this is not possible, but it is possible to simulate data given parameters. ABC simulates data for many parameter values and “accepts” those parameter values where there is a close match between simulated and observed data. Here is a simple version of ABC:

1. Simulate parameters $\theta^*$ from the prior density $\pi(\theta)$.
2. Simulate data $y^*$ from the model conditional on $\theta^*$.
3. Accept if $d(y^*,y_0) \leq \epsilon$.
4. Return to step 1 if fewer than N simulations have been performed.

Step 3 accepts if the simulated data $y^*$ is sufficiently close to the observed data $y_0$ based on a distance function $d(\cdot,\cdot)$ and a threshold $\epsilon$.

The accepted parameter values $\theta_1, \ldots, \theta_M$ can then be used for inference. In particular the posterior expectation of some function $h(\theta)$ has the Monte Carlo estimate

$M^{-1} \sum_{i=1}^M h(\theta_i).$

The results are approximations whose quality depends on choices including $\epsilon, N$ and $d$. For more details on ABC see the wikipedia page and the references there.

For particularly complicated models, the large numbers of simulations required by ABC can be slow. Lazy ABC tries to speed up the simulation stage by abandoning unpromising simulations early. This leads to a modified version of the above algorithm.

1. Simulate parameters $\theta^*$ from the prior density $\pi(\theta)$.
2. Conditional on $\theta^*$ perform a partial simulation to get $x^*$.
3. With probability $1-\alpha(\theta, x)$ stop early (go to step 6).
4. Complete the simulation to get $y^*$.
5. Accept if $d(y^*,y_0) \leq \epsilon$. If accepted, assign the weight $1/\alpha(\theta^*,x^*)$.
6. Return to step 1 if fewer than N simulations have been performed.

Lazy ABC gives weighted output $(\theta_1, w_1), \ldots, (\theta_M, w_M)$ which can be used to create estimates in the same way as importance sampling i.e. an estimate of the posterior expectation of $h(\theta)$ is

$\frac{\sum_1^M w_i h(\theta_i)}{\sum_1^M w_i}$.

The paper concentrates on the case when the $x$ variable in lazy ABC is a partial simulation of the dataset. For example if the data is state 1000 of a time series, then $x$ could be the first 100 states. Alternatively $x$ could be some other cheap source of information about the final dataset. For example several authors have recently looked at fitting a statistical model to some preliminary simulations and using this to produce an estimate of the likelihood. Lazy ABC will be somewhat slower than simply using this approximate model for inference, as further datasets must still be simulated, but has the advantage of not introducing approximation errors beyond those of standard ABC, as discussed next.

Theorem 1 proves the central property of lazy ABC. It targets the same distribution as standard ABC. That is, for a large number of iterations, the standard and lazy ABC estimates converge to the same quantities. This is true for any choice of $\alpha$ function (almost – a weak condition is given in the paper). However a poor choice of $\alpha$ will make convergence slow so that the algorithm is inefficient.

Theorem 2 gives the optimal choice of $\alpha$, which is

$\min(1, \lambda \left[ \frac{\gamma(\theta,x)}{\bar{T}_2(\theta,x)} \right]^{1/2}),$

using the following notation

$\gamma(\theta,x)$ is the probability that the simulation will be accepted in step 5 given $\theta$ and $x$
$\bar{T}_2(\theta, x)$ is the expected CPU time of steps 4 and 5 given $\theta$ and $x$
$\lambda$ is a positive value without a simple closed form expression

To use this in practice the paper suggests performing an initial pilot run of model simulations and using this to estimate $\gamma$ and $\bar{T}_2$. It’s difficult to estimate these based on $(\theta, x)$ as this is typically high dimensional. So low dimensional “decision statistics” $\phi(\theta, x)$ are introduced. The optimal $\alpha$ based on this is

$\min(1, \lambda \left[ \frac{\gamma(\phi)}{\bar{T}_2(\phi)} \right]^{1/2}),$

using similar notation to above. Once $\gamma$ and $\bar{T}_2$ have been estimated, $\lambda$ is chosen by numerical maximisation of an estimate of the algorithm’s efficiency based on the pilot run.

The paper illustrates this by an application to spatial extremes (taken from Erhardt and Smith 2012). In various simulation studies the increase in efficiency (roughly speaking, the reciprocal of the time saving) is 2-9 times.

The paper includes some extensions to this basic framework, including multiple stopping decisions and a version of ABC importance sampling. Hopefully it’s possible to extend the idea to MCMC and SMC algorithms as well.

Lazy ABC isn’t a magic bullet but hopefully adds a useful extra ABC tool. Two uses are illustrated in the paper. Firstly, one may wish to consider stopping all simulations early in the hope of achieving major time savings. The drawback is that it may be difficult to estimate the various quantities required for tuning, so some implementation effort is required. Alternatively one may consider stopping when a particular problematic event occurs – in the paper this is when a particularly slow simulation method needs to be used. Tuning is much simpler here as there is a binary decision statistic. This latter approach aims to avoid inefficient behavior of ABC (e.g. very slow simulation time on some parameter regions which turn out to contribute little) in a principled way.

## Semi-automatic ABC: ordinary meeting

December 19, 2011

Last week I had the pleasure of taking part in an RSS ordinary meeting on my paper with Paul Fearnhead and I thought I’d take the chance to write up my notes on it. They were a bit hurried so apologies if I misrepresent what anyone said!

The afternoon began with a Young Statistician’s Section pre-ordinary meeting. Michael Stumpf gave an introductory talk about ABC, starting from biological applications with intractable models to motivate the use of ABC. On our paper he commented that when using noisy ABC we “actively destroy” part of the data, and made a link between regression adjustment and Richard Wilkinson’s ideas that ABC is exact given measurement (or model) error. Then I gave a (hopefully) accessible introductory talk on the paper, the slides of which are here.

The main meeting began with Paul Fearnhead presenting the paper, including an overview of likelihood free methods in general and how ABC relates to them. Mark Beaumont proposed the vote of thanks. Amongst other comments he asked what information is lost between having sufficient summary statistics and our idea of calibration, and what other features of the posterior we could get by using other loss functions rather than quadratic loss. These comments were echoed by many of the other discussants, including the seconder, Christian Robert (whose slides are available on his blog). He also questioned whether our asymptotic result on the curse of dimensionality (Lemma 1) really deserves the interpretation we give it of motivating minimally low dimensional summary statisics and mentioned that in practice using more summary statistics than parameters seemed useful. Paul’s reply mentioned that we’ve had some success looking at regression methods, in particular sliced inverse regression, which can provide multiple summaries for each response (Some work on this is in my thesis).

The discussants began with Richard Wilkinson, who gave a very interesting contribution on the difference between our use of calibration and its standard use in the literature (going back to Dawid’s prequential statistics). The idea is that we are calibrated against our prior beliefs rather than reality. Julien Cornebise mentioned that ABC forms a non-parametric estimate of the likelihood, while Simon Wood’s synthetic likelihood approach makes a Gaussian estimate, and wondered whether this idea could be used within ABC. Mark Girolami aksed if it may sometimes be of more intrinsic interest to find the posterior conditional on certain summaries rather than the full data, an idea also proposed in Simon Wood’s Nature paper. Anthony Lee presented some potentially very interesting parallel MCMC algorithms for ABC, although the details were too much for me to take in! Simon White talked about potential methods of using noisy ABC for sequential inference, specialising in iid data, rather than state space models as we looked at in the paper. Xia Yingeun started by describing ABC-like methods going back to Student! He wondered how our approach would do on a continuous time version of the Ricker model for a blowfly dataset, as analysed in Simon Wood’s paper.

Christophe Andrieu noted that intractable models typically have the form $Y=\phi(\theta,U)$ i.e. the data is a function of parameters $\theta$ and random variables $U$ for which the distribution is known. This opens up the possibility of latent variable approaches (One related published work, specialised to some particular models, is Peter Neal’s idea of coupled ABC). Simon Wood commented on difficulties stably reproducing our results for the Ricker model taken from his Nature paper, but also some success in using the general idea of a regression on the parameter values to produce summary statistics. I didn’t catch the name of the final contributor, but he suggested focussing on inference for marginal parameter distributions, and using the notion of marginal sufficiency (not something I’d seen before).

There wasn’t time to read the written contributions, only a tantalising list of contributors, including Andrew Gelman, Ajay Jasra, Michael Blum and Olivier Francois, Brad Efron and Kanti Mardia.

Overall it was a great experience as a young researcher to get this much feedback on research I’d contributed to, and I look forward to working on the response!

## Semi-automatic ABC – list of typos

December 8, 2011

I’m very pleased that next week my paper with Paul Fearnhead is being discussed at the Royal Statistical Society. Unfortunately there are still several typos in the manuscript so I’ll keep an up to date list of them here. This is expanded from a comment I wrote on Xian Robert’s blog – thanks to him and others for spotting these. I’ll post up a link to my summary slides on the paper for the pre-ordinary meeting when I’ve finished writing them…

Typos:

• We write $K[\ldots]/h$ in several places where $K[\ldots/h]$ would be correct.
• The definition of Algorithm 3 on page 6 should define $s_\text{obs} = S(y_\text{obs}) + hx$, not $h(x)$.
• Theorem 1 should state “Algorithm 3 produces an ABC posterior that is calibrated”, not Algorithm 1.
• In the last equation in the proof of Theorem 2, the final $\theta$ should be $\theta_0$.
• In the proof of Theorem 2, there is a $1/h$ factor before an integral in two places. This should be $1/h^d$ (where $d$ is the dimension of the summary statistics).
• On line 44 of page 4 we write $h(\theta)$, which should be $a(\theta)$.

## gk R package

July 12, 2011

I’ve started tidying up some code from my thesis to create an R package called “gk”, containing functions for the g-and-k distribution.  This defined by its quantile function (i.e. its inverse cdf) $Q(\Phi^{-1}(u))$ where $\Phi^{-1}$ is the standard normal cdf and
$Q(z) = A + B \left( 1 + c \frac{1-\exp(-gz)}{1+\exp(-gz)} \right) (1+z^2)^k z \qquad \text(1)$
To explain, if u is a random variable with a uniform distribution on [0,1], then $z=\Phi^{-1}(u)$ follows a standard normal distribution and Q(z) follows the g-and-k distribution. Indeed this is a straightforward method (the inversion method) for simulation.

However, the cdf and density functions are not available in closed form making likelihood based inference hard (although possible through numerical inversion of (1)).  Inference by ABC methods is possible, and this has been much studied, both from interest in using the g-and-k distribution and as a testing ground for research in ABC methods (References include Allingham et al, Drovandi and Pettitt, and McVinish.)

The package contains distributional functions in the standard R format, plus some helper functions for, and applications of these.  These functions should be useful for exploratory implementations of ABC methods.  No such code is included (for now!) as my thesis ABC g-and-k code is messy and uses a lot of C routines.  One application function that is included is gkMLE which estimates MLEs under a model of iid g-and-k observations.

A few technical notes.  The helper functions z2gk and gk2z are the main workhorse functions.  z2gk converts standard normal quantiles to g-and-k quantiles by equation (1).  gk2z performs the inversion operation numerically (A technical point is that this required providing upper and lower bounds on $Q^{-1}(x)$.  Some rough work detailing the bounds I used is here.)  The pgk dgk and gkMLE functions rely on gk2z so they are subject to numerical error and only limited confidence should be placed on their accuracy.