We
are organising a one-day meeting on recent advances on ABC methods in
Paris, at Université
Paris Dauphine, next June 26, 2009, as the final step of our ANR
2005-2008 Misgepop
project,
with financial support from Université
Paris Dauphine (BQR) and from the GIS
"Sciences de la Décision" X-HEC-ENSAE. There
have been so many advances in this area in the past year or so
that a single day is obviously too short to cover the whole field, but
it
should
nonetheless put the highlight on those advances and bring the (local)
communities
together. Further and longer meetings may also stem from that one.
The program of the
workshop is
- 9h30-10h00:
Welcome coffee
- 10h00-10h30: Arnaud
Doucet, Institute
of Statistical Mathematics, Tokyo, and Ajay Jasra, Imperial
College London, "An
Adaptive Sequential Monte Carlo Method for Approximate Bayesian
Computation" [slides]
Approximate Bayesian computation (ABC) is a
popular approach to address inference problems where the likelihood
function is intractable, or expensive to calculate. To improve over
Markov chain Monte Carlo (MCMC) implementations of ABC, the use of
sequential Monte Carlo (SMC) methods has recently been suggested.
Effective SMC algorithms that are currently available for ABC have a
computational complexity that is quadratic in the number of Monte Carlo
samples and require the careful choice of simulation parameters. In
this article an adaptive SMC algorithm is proposed which admits a
computational complexity that is linear in the number of samples and
determines on-the-fly the simulation parameters. We demonstrate our
algorithm on both a toy and a population genetics example.
- 10h30-11h00: Tina
Toni,
Imperial College London, "ABC
SMC for dynamical systems" [slides]
Approximate Bayesian Computation (ABC)
methods can be used in situations where the evaluation of the
likelihood is computationally prohibitive. They are thus ideally suited
for the analysis of complex dynamical systems (Toni et al.
2009), where knowledge of the full (approximate) posterior is often
essential. Here we discuss improvements to an ABC approach, which is
based on sequential Monte Carlo (SMC). We are particularly interested
in applying ABC SMC to the increasingly important model
selection problem. We will discuss how ABC SMC can be adapted for model
selection for dynamical systems given a set of candidate models. In
particular we will discuss how we can balance the "fit" to the data
with the complexity of the simulation model. Being based on repeated
simulation, ABC SMC is computationally expensive for models with many
parameters (such as those considered in systems biology). We present an
exploration of different perturbation kernels, which can improve the
computational efficiency by exploring large-dimensional parameter
spaces, yet still allow us to address the issue of maintaining particle
diversity to obtain good approximations to the posterior distribution.
- 11h00-11h30: Marc
Briers,
QinetiQ Malvern, "Marginal and joint space representations within ABC, and the issue of bias" [slides]
In
this talk we will discuss two representations of the target
distribution within an ABC context (relating to a marginal and joint
space representation of the target distribution). We will also discuss
the bias arguments related to the paper by Sisson et al (2007). We will
establish a set of unbiased ABC-SMC based algorithms, and finally
provide an application.
- 11h30-12h00: Christoph
Leuenberger, Université de Fribourg, "ABC and Model Selection in
Population Genetics" [slides]
A
key innovation to ABC was the use of a post-sampling regression
adjustment, allowing larger tolerance values and as such shifting
computation time to realistic orders of magnitude
(Beaumont et al.). In my talk I propose a reformulation of the
regression adjustment in terms of a General Linear Model
(GLM). This allows a natural integration into the theoretical framework
of Bayesian statistics and the use of its methods, including model
selection via Bayes factors. As an illustration, the
proposed methodology is applied to the question of population
subdivision among western chimpanzees.
- 12h00-12h30: Oliver
Ratman,
Imperial College London, "Model
Criticism based on likelihood-free inference, with an application to protein
network evolution" [slides]
In
many areas of computational biology, the likelihood of a scientific
model is intractable, typically because interesting models are highly
complex. This hampers scientific progress in terms of iterative data
acquisition, parameter inference, model checking and model refinement
within a Bayesian framework. We provide a statistical interpretation to
current developments in likelihood-free Bayesian inference that
explicitly accounts for discrepancies between the model and the data,
termed Approximate Bayesian Computation under model uncertainty (ABCµ)
(1). We augment the likelihood of the data with unknown error terms
that correspond to freely chosen checking functions, and describe
possible Monte Carlo strategies for sampling from the associated joint
posterior distribution without the need of evaluating the likelihood.
We discuss the benefit of incorporating model diagnostics within an ABC
framework, and demonstrate how this method diagnoses model mismatch and
guides model refinement by contrasting three qualitative models of
protein network evolution to the protein interaction datasets of
Helicobacter pylori and Treponema pallidum. The presented
methods will be useful in the initial stages of model and data
exploration, and in particular to efficiently scrutinize several models
for which the likelihood is intractable by direct inspection of their
summary errors, prior to more formal analyses.
- 12h30-13h00: Jean-Michel
Marin, Université de Montpellier 2, "ABC methods for model choice in
Gibbs random fields" [slides]
The core idea is that, for Gibbs random
fields and in
particular for Ising models, when comparing several neighbourhood
structures, the computation of the posterior probabilities of the
models under competition can be operated by likelihood-free
simulation techniques (ABC).
The turning point for this resolution is that, due to the specific
structure of
Gibbs random field distributions, there exists a sufficient statistic
across models which allows for an exact (rather than Approximate)
simulation from the posterior probabilities of the models. Obviously,
when the structures grow more complex, it becomes necessary to
introduce a true ABC step with a tolerance threshold in
order to avoid running the algorithm for too long. Our toy example
shows that the accuracy of the approximation of the Bayes factor can be
greatly improved by resorting to the original ABC approach, since it
allows for the inclusion of many more simulations. In a biophysical
application to the choice of a folding structure for two proteins, we
also demonstrate that we can implement the ABC solution on realistic
datasets and, in the examples processed there, that the Bayes factors
allow for a ranking more standard methods (FROST, TM-score) do not.
- 13h00-13h30:
Lunch break (bring your own
sandwich!!!, tea and coffee will be available)
- 13h30-14h00: David
Balding and Matt
Nunes,
Imperial College London, "Selecting
summary statistics for ABC" [slides]
Recently Joyce and Marjoram ("Approximately
sufficient statistics and Bayesian computation", Stat. Appl. Genet.
Mol. Biol. 7(1):26, 2008) developed a sequential scheme for selecting
the best subset of summary statistics to use in ABC, given a set of
candidate summary statistics. Their approach was based on a notion of
approximate sufficiency. We will report the results of our
investigation seeking ways to improve on their scheme, using
Kullback-Leibler divergence.
- 14h00-14h30: Paul
Fearnhead,
University of Lancaster, "Choice
of Summary Statistics for ABC" [slides]
We will look at how simulation can be used
to produce informative summary statistics within ABC. The issue will be
investigated both theoretically and via simulation, including
comparisons with examples of ABC taken from the literature.
- 14h30-15h00: Marc
Beaumont,
University of Reading, "ABC
and hierarchical models: summary statistics, algorithms, and
applications in population genetics" [slides]
Recently a group of techniques, variously
called likelihood-free
inference, or Approximate Bayesian Computation (ABC), have been quite
widely applied in population genetics. These methods typically require
the data to be compressed into summary statistics. In a hierarchical
setting one may be interested both in hyper-parameters and parameters,
and there may be very many of the latter - for example, in a genetic
model, these may be parameters describing each of many loci or
populations. This poses a problem for ABC in that one then
requires
summary statistics for each locus, and, if used naively, a
consequent
problem in conditional density estimation. We develop a
general method
for addressing these problems efficiently, and we describe recent work
in which the ABC method can be used to detect loci under local
selection.
- 15h00-15h30: Michael
Blum,
TIMC, Grenoble, "Approximate
Bayesian Computation: a non-parametric perspective" [slides]
We present Approximate Bayesian Computation
as a technique of inference that relies on stochastic simulations and
non-parametric statistics. For both the original estimator of the
posterior distribution based on kernel smoothing and a refined version
of the estimator based on a linear adjustment, we give their asymptotic
bias and variance. Additionally, we introduce an original estimator of
the posterior distribution based on quadratic adjustment and we show
that its bias contains a smaller number of terms compared to the
estimator with linear adjustment. Although, we find that the estimators
with adjustment are not universally superior to the estimator based on
kernel smoothing, we find that they can achieve better performance when
there is a nearly homoscedastic relationship between the summary
statistics and the parameter. Last, we show that both asymptotic
results and numerical simulations emphasize the importance of the curse
of dimensionality in Approximate Bayesian Computation.
- 15h30-16h00: Tea
(and coffee) break
- 16h00-16h30: Richard
Wilkinson, University of Sheffield, "The error in ABC" [slides]
The approximation error in ABC algorithms
can be understood by the consideration of an additive error term, where
the distribution of this error can be inferred from the choice of
metric and acceptance kernel. Once we are aware of this we can begin to
think more carefully about what model error we expect for our models,
and consequently what metric, tolerance and summaries we would ideally
use. There may also be the opportunity to rewrite some models so that
sampling can be done by the ABC rejection step, thus raising the
possibility of exact inference in some cases.
- 16h30-17h00: Christophe
Andrieu, University of Bristol, "ABC and exact approximations";
- 17h00-17h30: Olivier
Francois, TIMC, Grenoble, "Non-linear regression models for
Approximate Bayesian Computation" [slides]
Approximate Bayesian inference on the basis
of summary statistics is well-suited to complex problems for which the
likelihood is either mathematically or computationally intractable.
However the methods that use rejection suffer from the curse of
dimensionality when the number of summary statistics is increased. Here
we propose a machine-learning approach to the estimation of the
posterior density by introducing two innovations. The new method fits a
nonlinear conditional heteroscedastic regression model on the summary
statistics by using a penalized least-squares method, and then
adaptively improves estimation by using importance sampling. We also
investigate the choice of the regularization parameter and the
tolerance rate in ABC algorithm with a version of the Deviance
Information Criterion. The new algorithm is compared to the
state-of-the-art approximate Bayesian methods, and achieves
considerable reduction of the computational burden in two examples of
inference in statistical genetics and in a queueing model.
- 17h30-18h00: Arnaud
Estoup,
CBGP, INRA, Montpellier, "From
theory to application: DIYABC, a user-friendly program to infer complex
population histories using Approximate Bayesian Computation" [slides]
DIYABC is a computer program with a
graphical user interface and a fully click-able environment. It allows
population biologists to make inference based on Approximate Bayesian
Computation (ABC), in which scenarios can be customized by the user to
fit many complex situations involving any number of populations and
samples. Such scenarios involve any combination of population
divergences, admixtures and population size changes. DIYABC can be used
to compare competing scenarios, estimate parameters for one or more
scenarios, and compute bias and precision measures for a given scenario
and known values of parameters (the current version applies to unlinked
microsatellite data).
This definitely
is a marathonian schedule (!), but it should allow for attendees
from France or nearby countries to make the round trip within the same day (if needed).
This
meeting is free, with no registration, and open to anyone interested.
The talks will take
place in Amphitheater 2-3 of Université Paris Dauphine, located on the
second floor of the (unique) university building. Université Paris
Dauphine is located in
downtown Paris (Porte Dauphine)
and is accessible by metro
(e.g., stops Porte Dauphine, or
Avenue Foch) as
explained there.
Contact
Christian Robert at
bayesianstatistics[(à)]gmail.com for further
practical information (but
the programme is now complete, no more talks, sorry!)