We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support.
The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path.
The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning.
The predictions of Large Language Models (LLMs) on downstream tasks often improve significantly when including examples of the input--label relationship in the context.
While conformal predictors reap the benefits of rigorous statistical guarantees for their error frequency, the size of their corresponding prediction sets is critical to their practical utility.
Information-theoretic approaches to active learning have traditionally focused on maximising the information gathered about the model parameters, most commonly by optimising the BALD score.
Conventional Bayesian Neural Networks (BNNs) cannot leverage unlabelled data to improve their predictions.
We formalize the problem of contextual optimization through the lens of Bayesian experimental design and propose CO-BED -- a general, model-agnostic framework for designing contextual experiments using information-theoretic principles.
We investigate the benefit of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary.
We introduce InstaAug, a method for automatically learning input-specific augmentations from data.
We provide the first complete continuous time framework for denoising diffusion models of discrete data.
We propose Active Surrogate Estimators (ASEs), a new method for label-efficient model evaluation.
We introduce implicit Deep Adaptive Design (iDAD), a new method for performing adaptive experiments in real-time with implicit models.
We present a variational method for online state estimation and parameter learning in state-space models (SSMs), a ubiquitous class of latent variable models for sequential data.
InteL-VAEs use an intermediary set of latent variables to control the stochasticity of the encoding process, before mapping these in turn to the latent representation using a parametric function that encapsulates our desired inductive bias(es).
Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision.
Expanding on MacKay (1992), we argue that conventional model-based methods for active learning - like BALD - have a fundamental shortfall: they fail to directly account for the test-time distribution of the input variables.
We show that the standard computational pipeline of probabilistic programming systems (PPSs) can be inefficient for estimating expectations and introduce the concept of expectation programming to address this.
We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input.
While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation.
We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of adaptive Bayesian experimental design that allows experiments to be run in real-time.
We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack.
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution.
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We propose methods to strengthen the invariance properties of representations obtained by contrastive learning.
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations.
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels.
The SRR provides a distinct and complementary measure of robust performance, compared to natural and adversarial risk.
We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED).
Universal probabilistic programming systems (PPSs) provide a powerful framework for specifying rich probabilistic models.
1 code implementation • 20 Oct 2019 • Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip H. S. Torr, Tom Rainforth, Yee Whye Teh, Frank Wood
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance.
no code implementations • • Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Saeid Naderiparizi, Adam Scibior, Andreas Munk, Frank Wood, Mehrdad Ghadiri, Philip Torr, Yee Whye Teh, Atilim Gunes Baydin, Tom Rainforth
We introduce two approaches for conducting efficient Bayesian inference in stochastic simulators containing nested stochastic sub-procedures, i. e., internal procedures for which the density cannot be calculated directly such as rejection sampling loops.
We make significant advances in addressing this issue by introducing methods for producing adversarially robust VAEs.
Recently there has been a significant interest in learning disentangled representations, as they promise increased interpretability, generalization to unseen scenarios and faster learning on downstream tasks.
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria.
Bayesian optimal experimental design (BOED) is a principled framework for making efficient use of limited experimental resources.
We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables.
We develop a generalisation of disentanglement in VAEs---decomposition of the latent representation---characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior.
We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation.
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods.
Hamiltonian Monte Carlo (HMC) is arguably the dominant statistical inference algorithm used in most popular "first-order differentiable" Probabilistic Programming Languages (PPLs).
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator.
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently.
Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation.
We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables.
We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models.
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models.
We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers.