Tutorial, Bayes net for forensic DNA analysis, Satellite, Dice, DNA Helix, Distance Matrix of pairs of amino acids, Bristol Balloon festival
SuSTaIn Events Structure and uncertainty workshop Invited Speakers List of Participants Programme Abstracts Posters Information for presenters Joining Information Directions to Bristol Workshop Location Conference dinner Accommodation Local Restaurants Other information will be posted here as soon as it is available Contact the organisers

 

Abstracts

(updated 14 September 2012)

Monday 24 September

12:00 - 13:00 Lunch and Registration

13:00 Sonia Petrone (Bocconi) - "More informative conditional density estimation through mixture models"

Bayesian inference in mixture models received great impulse with the influential paper by Richardson and Green (JRSS Ser. B, 1997) and, more recently, with the development of Bayesian nonparametric models. A thoughtful discussion of these approaches is in Green and Richardson (Scand. J. Stat., 2001). Bayesian mixture models are now popular tools in a wide range of statistical problems, from density estimation and nonparametric regression to more complex applications. In this talk, we examine the predictive performance of Dirichlet process mixture models for nonparametric regression and conditional density estimation. We propose a solution based on the Enriched Dirichlet process that overcomes issues encountered in presence of a large number of covariates by making better use of the information provided by the data. Our proposal maintains a simple allocation rule, so that computations remain relatively simple. Advantages are shown through both predictive equations and examples, including an application to diagnosis Alzheimer's disease. (This is joint research with Sara Wade, David Dunson and Lorenzo Trippa.)

13:50 Julia Mortera (Università Roma Tre) - "Probabilistic Expert Systems for Forensic Identification"

Problems of forensic identification from DNA profile evidence can become extremely challenging, both logically and computationally, in the presence of such complicating features as missing data on individuals, mutation, mixed trace evidence, laboratory contamination and artifacts as well as violations of standard assumptions about founding genes. In recent years it has been shown how object-oriented Bayesian networks can be used to represent and solve such problems. This architecture proves particularly natural and useful for complex forensic identification problems. I will describe a "construction set" of fundamental networks, that can be pieced together, as required, to represent and solve a wide variety of problems arising in forensic genetics. Probabilistic expert systems can be used to analyse forensic identification problems involving DNA mixture traces using peak area information. This information can be exploited to make inferences regarding the genetic profiles of unknown contributors to the mixture, or for evaluating the evidential strength for a hypothesis that DNA from a particular person is present in the mixture. We illustrate the use of the networks on published criminal casework examples. (Part of this work is joint with Peter Green.)

14:15 Coffee break

14:45 Alix Gitelman (Oregon State University) - "Multivariate Spatial Models with Continuous and Discrete Components"

For multivariate observations collected on a spatial landscape, some associations among observations may represent spatial dependence and others may represent regression-type relationships. We specifically consider a situation in which a spatially correlated Gaussian variable is used as predictor for a spatially correlated Bernoulli variable. In this setting, we develop a method for specifying the joint distribution of the variables that accounts for the spatial dependence in both variables across sites and the regression relationship between them at each observed location. We use a generalized tree network to partition the joint distribution and a Gaussian copula to model the dependence structures in the data. The copula separates variables' dependencies from their marginal distributions, which is particularly useful for the spatially dependent Bernoulli components. We use Markov Chain Monte Carlo simulations to estimate parameters, and we demonstrate our results using data simulated on a lattice as well as a real dataset.

15:15 - 16:00 Thomas Richardson (University of Washington) - "Nested Markov Properties for Acyclic Directed Mixed Graphs"

Directed acyclic graph (DAG) models may be characterized in four different ways: via a factorization, the d-separation criterion, the moralization criterion, and the local Markov property. It has been understood for a long time that marginals of DAG models also imply equality constraints that are not conditional independences. The well-known Verma constraint is an example. Tian (2002) provided a general algorithm for enumerating these constraints. Using acyclic directed mixed graphs (ADMGs) we provide a graphical characterization of the constraints given by Tian's algorithm via a nested Markov property. We define our nested property using a simple 'fixing' transformation that we apply to graphs and distributions. This transformation unifies marginalization, conditioning and the computation of intervention distributions via re-weighting. We give four characterizations of our nested model that are analogous to those given for DAGs. We show that marginal distributions of DAG models obey this property. Time permitting I will also describe a simple parametrization of the resulting model in the discrete case, and associated algorithms. (This is joint work with Robin Evans, James Robins and Ilya Shpitser.)

16:30 - 18:00 Reception and posters

Tuesday 25 September

09:00 Arnaud Doucet (University of Oxford) - "Efficient Implementation of MCMC when using an unbiased likelihood estimator"

When an unbiased estimator of the likelihood is used within a MCMC scheme, it is necessary to tradeoff off the number of samples used against the computing time. Many samples for the estimator will result in a MCMC scheme which has similar properties to the case where the likelihood is exactly known but will be expensive. Few samples for the construction of the estimator will result in faster estimation but at the expense of slower mixing of the resulting Markov chain. We explore the relationship between the number of samples and the efficiency of the resulting MCMC estimates. Under specific assumptions about the likelihood estimator, we provide guidelines on the number of samples to select for a general Metropolis-Hastings proposal. We additionally provide theory which justifies the use of these assumptions for a large class of models. On a number of examples, we find that the assumptions on the likelihood estimator are accurate. (This is joint work with Mike Pitt (Warwick) and Robert Kohn (UNSW).)

09:50 Nicolas Chopin (CREST, INSEE) - "Expectation-Propagation for Likelihood-Free Inference"

Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have since been suggested, the technique suffers from three major shortcomings. First, it requires introducing a vector of "summary statistics", the choice of which is arbitrary and may lead to strong biases. Second, ABC may be excruciatingly slow due to very low acceptance rates. Third, it cannot produce a reliable estimate of the marginal likelihood of the model.

We introduce a technique that solves the first and the third issues, and considerably alleviates the second. We adapt to the likelihood-free context a variational approximation algorithm, Expectation Propagation (Minka, 2001). The resulting algorithm is shown to be faster by a few orders of magnitude than alternative algorithms, while producing an overall approximation error which is typically negligible. Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques. (joint work with Simon Barthelmé)

10:15 Coffee break

10:45 Oliver Ratmann (Duke University and Imperial College) - "Approximate Bayesian Computation based on summaries with frequency properties"

Approximate Bayesian Computation (ABC) has quickly become a valuable tool in many applied fields, but the statistical properties obtained by choosing a particular summary, distance function and error threshold are poorly understood. In an effort to better understand the effect of these ABC tuning parameters, we consider summaries that are associated with empirical distribution functions. These frequency properties of summaries suggest what kind of distance function are appropriate, and the validity of the choice of summaries can be assessed on the fly during Monte Carlo simulations. Among valid choices, uniformly most powerful distances can be shown to optimize the ABC acceptance probability. Considering the binding function between the ABC model and the frequency model of the summaries, we can characterize the asymptotic consistency of the ABC maximum-likelhood estimate in general situations. We provide examples from phylogenetics and dynamical systems to demonstrate that empirical distribution functions of summaries can often be obtained without expensive re-simulations, so that the above theoretical results are applicable in a broad set of applications. In part, this work will be illustrated on fitting phylodynamic models that capture the evolution and ecology of interpandemic influenza A (H3N2) to incidence time series and the phylogeny of H3N2's immunodominant haemagglutinin gene.

11:15 Christian Robert (Paris-Dauphine) - "ABC and empirical likelihood"

Approximate Bayesian computation (ABC) has now become an essential tool for the analysis of complex stochastic models when the likelihood function is unavailable. The well-established statistical method of empirical likelihood however provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being provably convergent in the number of observations. Furthermore, avoiding model simulations leads to significant time savings in complex models, as those used in population genetics. The ABCel algorithm we present in this talk provides in addition an evaluation of its own performances through an associated effective sample size. The method is illustrated on several realistic examples. (Joint work with K.L. Mengersen and P. Pudlo)

12:00 - 13:00 Lunch

13:00 Steffen Lauritzen (University of Oxford) - TBA

13:50 Paola Vicard (Università Roma Tre) - "Object-Oriented Bayesian Networks for a Decision Support System"

We study an economic decision problem where the actors are two firms and the Antitrust Authority whose main task is to monitor and prevent firms potential anti-competitive behaviour. The Antitrust Authority's decision process is modelled using a Bayesian network whose relational structure and parameters are estimated from data provided by the Authority itself. Several economic variables influencing this decision process are included in the model. We analyse how monitoring by the Antitrust Authority affects firms cooperation strategies. These are modelled as a repeated prisoners dilemma using object-oriented Bayesian networks, thus enabling integration of firms decision process and external market information.

14:15 Coffee break

14:45 Claudia Tarantola (University of Pavia) - "Conjugate and Conditional Conjugate Bayesian Analysis of Discrete Graphical Models of Marginal Independence"

We present a conjugate and conditional conjugate Bayesian analysis of marginal loglinear models with a bi-directed graph representation. We exploit the connection between bi-directed graphs and DAGs. A bi-directed graph can always be represented via a Markov equivalent DAG with the same vertex set or with the introduction of further vertices, representing latent or hidden variables. The representation in terms of a DAG allows us to use efficient prior distributions based on product of Dirichlet priors. The marginal likelihood of graphs without a direct representation in terms of a DAG is computed using the Chib approximation method. The posterior distribution of the marginal log-linear parameters is obtained via Monte Carlo simulation The methodology is illustrated with reference to a three and a four way contingency tables. (Joint work with Ioannis Ntzoufras).

15:15 - 16:00 Alberto Roverato (Università di Bologna) - "Dichotomization invariant log-mean linear parameterization of discrete graphical models of marginal independence"

Graphical models of marginal independence use a graph where every vertex is associated with a variable and missing edges encode marginal independence relationships according to a given Markov property. The probability distribution of a set of discrete variables is characterized by the associated probability table, but defining a suitable parameterization for these models is not straightforward. A basic requirement for the flexible implementation of marginal constraints is that interaction terms involving a subset of variables satisfy upward compatibility, that is they should reflect a property of the corresponding marginal distribution. Upward compatibility means invariance with respect to marginalization but, for discrete variables with arbitrary number of levels, a stronger invariance property may be required. Collapsing two or more levels of a discrete variables into a single level can be regarded as as a special kind of marginalization and invariance with respect to this operation is an useful feature for a parameterization.

We extend the Log-Mean Linear (LML) parameterization introduced by Roverato, Lupparelli and La Rocca (2011, arXiv:1109.6239) for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize graphical models of marginal independence. Furthermore, we show that the LML parameterization satisfies a stronger version of upward compatibility that we call dichotomization invariance. As a consequence, the LML parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. This feature is useful in several applied contexts, such as genetic association studies. Furthermore, it provides a natural way to reduce the parameter count by means of substantive constraints that give additional insight on the dependence structure of variables.

16:30 - 18:00 Reception and posters

Wednesday 26 September

09:00 Leonard Held (University of Zurich) - "Objective Bayesian model selection based on test statistics"

In medical research and elsewhere, variable and model selection in regression is a common problem. Bayesian model selection based on test statistics (Johnson, 2005, 2008, Hu and Johnson, 2009) is an approach, which eliminates the need to specify proper prior distributions on regression parameters. The method is applicable to generalized linear models and the Cox model using the deviance statistic. In this talk I will review Johnson's methodology and point out connections to earlier work on the link between P-values and Bayes factors. An objective Bayesian model selection procedure is then proposed based on the combination of Johnson's methodology with hyper g-priors. (This is joint work with Daniel Sabanés Bové.)

09:50 Richard Everitt (Reading University) - "Bayesian parameter estimation for latent MRFs"

In a range of applications, including population genetics, epidemic modelling and social network analysis, the data from which we wish to estimate parameters of interest consists of noisy or incomplete observations of an unobserved process. Bayesian statistics offers a framework in which to tackle this problem, accurately accounting for the uncertainty present due to the missing data. However, standard Markov chain Monte Carlo (MCMC) methods that are used to implement the Bayesian approach can perform poorly in this situation. In this talk we describe two alternatives to standard MCMC approaches: approximate Bayesian computation (ABC) and particle MCMC. Both methods are applied to parameter estimation of a hidden Markov random field, and are compared to the standard data augmentation approach.

10:15 Coffee break

10:45 James Cussens (University of York) - "Model averaging for graphical models by repeated optimal model selection"

In previous work (UAI-2011) the Bayesian network with the globally optimal score (log marginal likelihood, assuming complete data) was found using integer linear programming (ILP). A limit on parent set size was used. By adding suitable constraints it is possible to find the k-best BNs by repeated searches. In this work we (i) consider how to scale up the approach by relaxing the limit on parents and (ii) consider the pros and cons of model averaging by searching for high probability models. This includes a comparison to MCMC.

11:15 Eric Moulines (Telecom-ParisTech) - "Model aggregation in a PAC Bayesian perspective"

Sparse regression addresses inference problems in which the number of parameters p to estimate is large compared to the sample size n. The main problem when the hypothesis space is high-dimensional problems is to propose estimators displaying favorable statistical performance but still having a manageable computational cost. Estimator based on penalized empirical risk minimization (with appropriately chosen sparsity inducing penalization) are known to perform theoretically well but are not able to address the combinatorial explosion of the hypothesis space. The Lasso estimator (and its many variants, like the group Lasso) makes the minimization problem convex, and leads to practical algorithms even when the number of regressors p is large. However stringent conditions (like the restricted isometry property) on the design have to be imposed to establish fast rates of convergence for this estimator.

Recently, several authors have introduced new classes of estimators achieving good statistical performance for prediction without stringent assumption on the design and leading to practical algorithms. These methods are all based on some form of aggregation rather than selection of the active regressors; these estimators are obtained by sampling a quantity which might be interpreted as a posterior distribution, whose construction is based on some well-founded information theoretic principles. We will also discuss the statistical performance of this construction using a sparsity oracle inequality in probability. (Joint work with B. Guedj (PhD), G. Biau (Prof. UPMC), P. Alquier (Assistant Prof, Dublin))

12:00 - 13:00 Lunch

13:00 Gareth Roberts (University of Warwick) - "Some Recent Advances in Optimal Scaling for MCMC"

The talk will present some established and then some recent results on the optimal scaling of MCMC algorithms. The common theme is the multi-dimensional one where, according to some parameterisation, components of the chain converge at different speeds. Examples will include optimal spacing of temperatures for simulated tempering. and Metropolis algorithm for ill-posed statistical models as the amount of data goes to infinity.

13:50 John Kent (University of Leeds) - "The EM algorithm for the matching problem"

Consider two configurations of landmarks in 2 or 3 dimensions such that suitable subsets of landmarks from each configuration can be nearly superimposed after a rigid body transformation. However, both the parameters of the transformation and the correspondences between the landmarks (which can be stored in a zero-one valued matching matrix) are assumed unknown, making this a challenging computational problem sometimes called the matching problem. Two approaches will be contrasted: a global MCMC approach (Green and Mardia, 2006) and a local EM approach based on a relaxation of the integer constraints on the matching matrix.

14:15 Coffee break

14:45 Natalia Bochkina (University of Edinburgh) - "A theoretical perspective on some Bayesian models in image analysis"

This theoretical study is motivated by an example from single photon emission computed tomography (SPECT), a medical imaging technique that involves a tomographic reconstruction of the spatial pattern of a radioactively-labelled substance, known to concentrate in the tissue to be imaged. In the Bayesian model for this problem considered by P. Green (1990) that is an ill-posed inverse problem with Poisson likelihood and a pairwise-interaction Markov random field prior, the likelihood is not identifiable and the prior distribution is improper which raises questions about convergence and concentration of the posterior distribution for this model.

We present results on convergence of the posterior distribution and its a local approximation around the limit. For regular Bayesian models, such a result is known as the Bernstein-von Mises theorem that states that the posterior distribution is approximately Gaussian. Applied to calculating functionals of the posterior distribution such as the expected value and the variance, this technique is known as the Laplace approximation that has been shown to be a good competitor to stochastic simulation methods for performing Bayesian inference. It turns out, that the Bayesian model for the SPECT example is non-regular in two ways: the likelihood is not identifiable and the intensities of the true image lie on the boundary of the parameter space. We show how this affects the rates of convergence of the posterior distribution, and that the local approximation of the posterior distribution is not always Gaussian. (This work is joint with Peter Green).

15:15 - 16:00 Nils Hjort (University of Oslo) - "Credibility, confidence and likelihood"

My presentation will first outline the basics of confidence distributions, which may be seen as frequentist analogues of the Bayesian's posterior distributions. Optimality criteria and recipes will be discussed and how these work will be demonstrated for certain applications. In particular, situations will be exhibited where the Bayesian machinery appears to be in trouble but where the confidence distribution approach works well.

19:00 - late Conference dinner at Riverstation

Thursday 27 September

09:00 Michael Newton (University of Wisconsin-Madison) - "Computational challenges in a role model for gene set analysis"

Often, genome-wide gene-level data require integration with prior biological knowledge that is recorded in collections of gene sets. A model-based approach to this task overcomes limitations of standard approaches, which score sets separately, though deploying inference continues to be challenging in routine applications. I will discuss the structure of a ``role model'' for gene set analysis and our attempts to address the computational problems with the implied posterior distributions. Calculations involve a highly constrained high-dimensional vector of Bernoulli trials.

09:50 Forrest Crawford (UCLA Biomathematics/Yale University Biostatistics) - "Evolution, branching processes, and Markov models on trees"

Inference of evolutionary relationships between organisms is a vital part of modern biology, and presents a major statistical estimation challenge. Modelers often treat species evolution as a stochastic branching process that generates a phylogenetic tree. Then, on the branches of this tree, another stochastic process gives rise to the observed data: often DNA sequence evolution is treated as a continuous-time Markov chain on the states ${A, G, C, T}$; quantitative trait changes are modeled by Brownian motion. The dependencies between observations induced by the branching structure of the evolutionary tree can make traditional statistical inference extremely challenging. However, we can exploit this special structure to derive efficient algorithms for inference: evolution of DNA sequences or traits along two branches of a phylogenetic tree occurs independently, conditional on the state of the most recent common ancestor. In this presentation, I outline the basic structure of conditional independence in evolutionary models and describe novel likelihood-based and Bayesian inference techniques for evolutionary processes on phylogenetic trees. I apply these techniques to several concrete and previously intractable problems in genetics, including the evolution of DNA microsatellites, the HIV genome, and gene families.

10:15 Coffee break

10:45 Joe Herman (University of Leeds) - "Factorised approximations for inference in continuous-time Bayesian networks"

In this work we address the problem of inference in stochastic systems that are composed of multiple subsystems. Models of this type in a variety of different circumstances, including in many areas of statistical physics. Our particular motivating example is that of biological sequence analysis, whereby the joint system consists of a DNA or amino acid sequence, and the constituent subsystems are the individual letters or sites in the sequence. The evolution of individual characters in such sequences as a result of mutation events has been successfully modelled in the continuous-time Markov chain framework. However, in most cases, since the state space of the joint system typically increases exponentially with the number of subsystems, naive methods often scale very poorly with system size, such that joint systems are usually modelled instead as an ensemble of non-interacting subsystems. In the context of biological sequence analysis this involves assuming that each site in a sequence evolves independently. Although such independent-site models and their extensions have proved highly useful, they are often insufficient to capture many of the important details of the evolutionary process.

Here we explore the use of continuous-time Bayesian networks (CTBNs) to model the joint evolution of multiple sites, and focus in particular on methods for computing likelihoods of observed sequences given a phylogenetic tree. In order to do so, we examine a class of CTBNs whose stationary distributions factor according to a graphical model. We then examine the conditions in which this permits the use of expectation-propagation type algorithms for propagating joint distributions forward or backward in time, and use this to formulate an algorithm for computing likelihoods in a model of molecular evolution with inter-site dependencies.

11:15 Christopher Holmes (University of Oxford) - "Reporting predictions from high-dimensional models: a decision theoretic approach"

We describe the use of Bayesian Hidden Markov models for robust detection of copy-number-aberrations (CNAs) in cancer genomes involving 100,000s of observations (genetic loci) on 100s of samples. CNAs involve stretches of DNA that are duplicated or deleted in tumour cells, and are a key driver of cancer initialisation and progression. At one level of abstraction this can be thought of as a task in change-point modelling or genome segmentation. We describe how Bayesian signal-processing methods using Hidden Markov models scaled to large-data are ideally suited to this task. We pay particular attention to the reporting of posterior summaries (predictions) under the model. We show how the use of decision theoretic loss functions lead to predictions with good properties relative to penalised likelihood methods. We also present a conditional Viterbi algorithm that allows for efficient exact enumeration of the MAP sample path conditional on a user specified number of state transitions or change points. The techniques are motivated and demonstrated by on-going real world studies.

12:00 - 13:00 Lunch

13:00 Sofia Massa (University of Oxford) - "Graphical models for Milky Way phase space"

The aim of this study is to investigate the structure of the phase space of our galaxy by exploring possible models of conditional independence. The phase space of a dynamical system is the space of all attainable states, which for a mechanical system such as the Milky Way, comprises spatial locations and velocity vecors of all galactic particles. The phase space coordinates of galactic particles (stars) sampled from a 4-dimensional volume within the Milky Way is reported in the astronomical literature. The phase space structure of the subspace that this data set is sampled from will be recovered by means of a comparison of the observed data with data that is generated in non-linear dynamical simulations of Milky way models.

The complexity of the physical system is captured and investigated via graphical models that take into consideration also the evolution in time of a sample of stars. (This is joint work with Dalia Chakrabarty, University of Warwick.)

13:30 Arnoldo Frigessi (University of Oslo) - "Safe preselection in lasso-type problems by cross-validation freezing"

We propose a new approach to safe variable preselection in high-dimensional penalized regression, such as the lasso. Preselection - to start with a manageable set of covariates - has often been implemented without clear appreciation of its potential bias. Based on a sequential implementation of the lasso with increasing lists of predictors, we find a new property of the set of corresponding cross-validation curves, a pattern that we call freezing. It allows to determine a subset of covariates with which we reach the same lasso solution as would be obtained using the full set of covariates. I will compare freezing with other recently discussed safe rules for discarding predictors. We demonstrate by simulation that ranking predictors by their univariate correlation with the outcome, leads in a majority of cases to early freezing, giving a safe and efficient way of focusing the lasso analysis on a smaller and manageable number of predictors. We illustrate the applicability of our strategy in the context of a GWAS analysis and on microarray genomic data. (This is joint work with Linn Cecilie Bergersen, Ismail Ahmed, Ingrid K. Glad, and Sylvia Richardson.)

14:15 - 14:45 Close and coffee