no code implementations • ICML 2020 • Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
1 code implementation • 16 Jul 2024 • Leo Klarner, Tim G. J. Rudner, Garrett M. Morris, Charlotte M. Deane, Yee Whye Teh
Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials.
1 code implementation • 15 Jun 2024 • Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster
Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e. g. demonstrations collected in simulation but deployment in the real world).
1 code implementation • 11 Apr 2024 • Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture.
1 code implementation • 13 Mar 2024 • Shengzhuang Chen, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz, Ying WEI
Recent successes suggest that parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning.
Ranked #1 on Few-Shot Image Classification on Meta-Dataset
1 code implementation • 7 Mar 2024 • Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, Jonathan Richard Schwarz
We propose an amortized feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank.
no code implementations • 3 Mar 2024 • Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias
We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation.
3 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.
2 code implementations • 19 Feb 2024 • Anya Sims, Cong Lu, Yee Whye Teh
The prevailing theoretical understanding is that this can then be viewed as online reinforcement learning in an approximate dynamics model, and any remaining gap is therefore assumed to be due to the imperfect dynamics model.
no code implementations • 1 Feb 2024 • Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets.
1 code implementation • 28 Dec 2023 • Tim G. J. Rudner, Zonghao Chen, Yee Whye Teh, Yarin Gal
Recognizing that the primary object of interest in most settings is the distribution over functions induced by the posterior distribution over neural network parameters, we frame Bayesian inference in neural networks explicitly as inferring a posterior distribution over functions and propose a scalable function-space variational inference method that allows incorporating prior information and results in reliable predictive uncertainty estimates.
1 code implementation • 28 Dec 2023 • Tim G. J. Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, Yarin Gal
Sequential Bayesian inference over predictive functions is a natural framework for continual learning from streams of data.
1 code implementation • 1 Aug 2023 • Ning Miao, Yee Whye Teh, Tom Rainforth
The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning.
1 code implementation • 14 Jul 2023 • Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M. Morris, Charlotte Deane, Yee Whye Teh
Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role.
1 code implementation • NeurIPS 2023 • Emile Mathieu, Vincent Dutordoir, Michael J. Hutchinson, Valentin De Bortoli, Yee Whye Teh, Richard E. Turner
In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
no code implementations • 14 Jun 2023 • Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein
Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting.
no code implementations • 4 Apr 2023 • Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin
Conventional Bayesian Neural Networks (BNNs) are unable to leverage unlabelled data to improve their predictions.
1 code implementation • NeurIPS 2023 • Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
no code implementations • 20 Feb 2023 • Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh
Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood.
no code implementations • 23 Jan 2023 • Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR).
1 code implementation • NeurIPS 2021 • Tim G. J. Rudner, Cong Lu, Michael A. Osborne, Yarin Gal, Yee Whye Teh
KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks.
1 code implementation • 15 Nov 2022 • Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de Las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato
A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks.
no code implementations • 7 Jul 2022 • James Thornton, Michael Hutchinson, Emile Mathieu, Valentin De Bortoli, Yee Whye Teh, Arnaud Doucet
Our proposed method generalizes Diffusion Schr\"odinger Bridge introduced in \cite{debortoli2021neurips} to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal.
no code implementations • 20 Jun 2022 • Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jörg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu
However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters.
2 code implementations • 9 Jun 2022 • Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh
Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain.
no code implementations • 9 Jun 2022 • Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet
Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees.
1 code implementation • 31 May 2022 • Ning Miao, Tom Rainforth, Emile Mathieu, Yann Dubois, Yee Whye Teh, Adam Foster, Hyunjik Kim
We introduce InstaAug, a method for automatically learning input-specific augmentations from data.
1 code implementation • 31 May 2022 • Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu, Jonathan Godwin
Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks.
no code implementations • 18 May 2022 • Jonathan Richard Schwarz, Yee Whye Teh
Recent work in Deep Learning has re-imagined the representation of data as functions mapping from a coordinate space to an underlying continuous signal.
1 code implementation • 22 Feb 2022 • Francisca Vasconcelos, Bobby He, Nalini Singh, Yee Whye Teh
To that end, we study a Bayesian reformulation of INRs, UncertaINR, in the context of computed tomography, and evaluate several Bayesian deep learning implementations in terms of accuracy and calibration.
2 code implementations • 6 Feb 2022 • Valentin De Bortoli, Emile Mathieu, Michael Hutchinson, James Thornton, Yee Whye Teh, Arnaud Doucet
Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance.
1 code implementation • 30 Jan 2022 • Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Goliński, Yee Whye Teh, Arnaud Doucet
Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities.
no code implementations • NeurIPS 2021 • Michael Hutchinson, Alexander Terenin, Viacheslav Borovitskiy, So Takao, Yee Whye Teh, Marc Peter Deisenroth
Gaussian processes are machine learning models capable of learning unknown functions in a way that represents uncertainty, thereby facilitating construction of optimal decision-making systems.
2 code implementations • NeurIPS 2021 • Jonathan Schwarz, Siddhant M. Jayakumar, Razvan Pascanu, Peter E. Latham, Yee Whye Teh
The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models.
1 code implementation • ICLR 2022 • Ning Miao, Emile Mathieu, N. Siddharth, Yee Whye Teh, Tom Rainforth
InteL-VAEs use an intermediary set of latent variables to control the stochasticity of the encoding process, before mapping these in turn to the latent representation using a parametric function that encapsulates our desired inductive bias(es).
1 code implementation • NeurIPS 2021 • Emile Mathieu, Adam Foster, Yee Whye Teh
Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series.
1 code implementation • NeurIPS 2021 • Jin Xu, Hyunjik Kim, Tom Rainforth, Yee Whye Teh
We use these layers to construct group equivariant autoencoders (GAEs) that allow us to learn low-dimensional equivariant representations.
no code implementations • NeurIPS 2021 • Siu Lun Chau, Jean-François Ton, Javier González, Yee Whye Teh, Dino Sejdinovic
While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging.
1 code implementation • ICLR Workshop Neural_Compression 2021 • Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet
We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image.
1 code implementation • 9 Feb 2021 • Emilien Dupont, Yee Whye Teh, Arnaud Doucet
By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that are agnostic to discretization.
1 code implementation • 20 Dec 2020 • Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, Hyunjik Kim
Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing.
1 code implementation • 25 Nov 2020 • Peter Holderrieth, Michael Hutchinson, Yee Whye Teh
Motivated by objects such as electric fields or fluid streams, we study the problem of learning stochastic fields, i. e. stochastic processes whose samples are fields like those occurring in physics and engineering.
2 code implementations • 29 Oct 2020 • Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman
While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty.
no code implementations • 27 Oct 2020 • Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess
In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and interaction patterns that are shared across a set of related tasks or contexts.
no code implementations • 10 Sep 2020 • Alexandre Galashov, Jakub Sygnowski, Guillaume Desjardins, Jan Humplik, Leonard Hasenclever, Rae Jeong, Yee Whye Teh, Nicolas Heess
The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones.
1 code implementation • NeurIPS 2020 • Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, Yee Whye Teh
While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility.
no code implementations • NeurIPS 2020 • Mrinank Sharma, Sören Mindermann, Jan Markus Brauner, Gavin Leech, Anna B. Stephenson, Tomáš Gavenčiak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal
To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make?
no code implementations • 16 Jul 2020 • Bryn Elesedy, Varun Kanade, Yee Whye Teh
We analyse the pruning procedure behind the lottery ticket hypothesis arXiv:1803. 03635v5, iterative magnitude pruning (IMP), when applied to linear models trained by gradient flow.
3 code implementations • NeurIPS 2020 • Bobby He, Balaji Lakshminarayanan, Yee Whye Teh
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs).
1 code implementation • NeurIPS 2021 • Sheheryar Zaidi, Arber Zela, Thomas Elsken, Chris Holmes, Frank Hutter, Yee Whye Teh
On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift.
no code implementations • ICLR 2020 • Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu
We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others.
no code implementations • 30 Mar 2020 • Giuseppe Di Benedetto, François Caron, Yee Whye Teh
In particular, the Indian buffet process is a flexible and simple one-parameter feature allocation model where the number of features grows unboundedly with the number of objects.
2 code implementations • 4 Mar 2020 • Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
no code implementations • ICLR 2021 • Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh
Overparameterized Neural Networks (NN) display state-of-the-art performance.
1 code implementation • ICML 2020 • Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban
Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning.
1 code implementation • ICML 2020 • Jin Xu, Jean-Francois Ton, Hyunjik Kim, Adam R. Kosiorek, Yee Whye Teh
We develop a functional encoder-decoder approach to supervised meta-learning, where labeled data is encoded into an infinite-dimensional functional representation rather than a finite-dimensional one.
1 code implementation • 1 Nov 2019 • Adam Foster, Martin Jankowiak, Matthew O'Meara, Yee Whye Teh, Tom Rainforth
We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED).
1 code implementation • NeurIPS 2019 • Dushyant Rao, Francesco Visin, Andrei A. Rusu, Yee Whye Teh, Razvan Pascanu, Raia Hadsell
Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially.
no code implementations • ICML 2020 • Yuan Zhou, Hongseok Yang, Yee Whye Teh, Tom Rainforth
Universal probabilistic programming systems (PPSs) provide a powerful framework for specifying rich probabilistic models.
1 code implementation • 20 Oct 2019 • Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip H. S. Torr, Tom Rainforth, Yee Whye Teh, Frank Wood
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance.
no code implementations • pproximateinference AABI Symposium 2019 • Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Saeid Naderiparizi, Adam Scibior, Andreas Munk, Frank Wood, Mehrdad Ghadiri, Philip Torr, Yee Whye Teh, Atilim Gunes Baydin, Tom Rainforth
We introduce two approaches for conducting efficient Bayesian inference in stochastic simulators containing nested stochastic sub-procedures, i. e., internal procedures for which the density cannot be calculated directly such as rejection sampling loops.
no code implementations • ICLR 2020 • Juho Lee, Yoonho Lee, Yee Whye Teh
We propose a deep amortized clustering (DAC), a neural architecture which learns to cluster datasets efficiently using a few forward passes.
11 code implementations • NeurIPS 2019 • Adam R. Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton
In the second stage, SCAE predicts parameters of a few object capsules, which are then used to reconstruct part poses.
Ranked #3 on Unsupervised MNIST on MNIST
no code implementations • NeurIPS 2019 • Shufei Ge, Shijia Wang, Yee Whye Teh, Liangliang Wang, Lloyd T. Elliott
The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane.
no code implementations • ICML Workshop LifelongML 2020 • Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A. Rusu, Yee Whye Teh, Razvan Pascanu
One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks.
2 code implementations • 7 Jun 2019 • Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Balaji Lakshminarayanan
To determine whether or not inputs reside in the typical set, we propose a statistically principled, easy-to-implement test using the empirical distribution of model likelihoods.
no code implementations • 5 Jun 2019 • Jean-Francois Ton, Lucian Chan, Yee Whye Teh, Dino Sejdinovic
Current meta-learning approaches focus on learning functional representations of relationships between variables, i. e. on estimating conditional expectations in regression.
no code implementations • 29 May 2019 • Bradley Gram-Hansen, Christian Schröder de Witt, Tom Rainforth, Philip H. S. Torr, Yee Whye Teh, Atılım Güneş Baydin
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria.
1 code implementation • 15 May 2019 • Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A. Ortega, Yee Whye Teh, Nicolas Heess
This includes proposals to learn the learning algorithm itself, an idea also known as meta learning.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
1 code implementation • ICLR 2019 • Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess
In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning.
no code implementations • ICLR 2019 • Jovana Mitrovic, Peter Wirnsberger, Charles Blundell, Dino Sejdinovic, Yee Whye Teh
Infinite-width neural networks have been extensively used to study the theoretical properties underlying the extraordinary empirical success of standard, finite-width neural networks.
no code implementations • ICLR 2019 • Tuan Anh Le, Adam R. Kosiorek, N. Siddharth, Yee Whye Teh, Frank Wood
Discrete latent-variable models, while applicable in a variety of settings, can often be difficult to learn.
6 code implementations • NeurIPS 2019 • Emilien Dupont, Arnaud Doucet, Yee Whye Teh
We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent.
Ranked #21 on Image Classification on MNIST
no code implementations • 28 Mar 2019 • Alexandre Galashov, Jonathan Schwarz, Hyunjik Kim, Marta Garnelo, David Saxton, Pushmeet Kohli, S. M. Ali Eslami, Yee Whye Teh
We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning.
no code implementations • 18 Mar 2019 • Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess
As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important.
1 code implementation • NeurIPS 2019 • Adam Foster, Martin Jankowiak, Eli Bingham, Paul Horsfall, Yee Whye Teh, Tom Rainforth, Noah Goodman
Bayesian optimal experimental design (BOED) is a principled framework for making efficient use of limited experimental resources.
1 code implementation • 7 Feb 2019 • Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i. e. a normalizing flow).
1 code implementation • ICLR 2020 • Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh
We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network.
no code implementations • 18 Jan 2019 • Benjamin Bloem-Reddy, Yee Whye Teh
Treating neural network inputs and outputs as random variables, we characterize the structure of neural networks that can be used to model data that are invariant or equivariant under the action of a compact group.
4 code implementations • NeurIPS 2019 • Emile Mathieu, Charline Le Lan, Chris J. Maddison, Ryota Tomioka, Yee Whye Teh
We therefore endow VAEs with a Poincar\'e ball model of hyperbolic geometry as a latent space and rigorously derive the necessary methods to work with two main Gaussian generalisations on that space.
7 code implementations • ICLR 2019 • Hyunjik Kim, andriy mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh
Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions.
1 code implementation • 6 Dec 2018 • Emile Mathieu, Tom Rainforth, N. Siddharth, Yee Whye Teh
We develop a generalisation of disentanglement in VAEs---decomposition of the latent representation---characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior.
1 code implementation • NeurIPS 2018 • Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang
However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune.
no code implementations • ICLR 2019 • Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess
We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids.
1 code implementation • ICLR 2019 • Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar
Furthermore, it provides an ability to scale to larger networks than formal verification approaches.
no code implementations • 31 Oct 2018 • Xiaoyu Lu, Tom Rainforth, Yuan Zhou, Jan-Willem van de Meent, Yee Whye Teh
We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation.
4 code implementations • ICLR 2019 • Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data.
9 code implementations • 1 Oct 2018 • Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh
Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances.
4 code implementations • 13 Sep 2018 • Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet
Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex.
1 code implementation • 9 Jul 2018 • Benjamin Bloem-Reddy, Adam Foster, Emile Mathieu, Yee Whye Teh
Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $\eta$ that may take values either less than and greater than two.
17 code implementations • ICML 2018 • Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function.
13 code implementations • 4 Jul 2018 • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh
A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision.
no code implementations • ICML 2018 • Wojciech Czarnecki, Siddhant Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Nicolas Heess, Simon Osindero, Razvan Pascanu
We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents.
no code implementations • 25 Jun 2018 • Tom Rainforth, Yuan Zhou, Xiaoyu Lu, Yee Whye Teh, Frank Wood, Hongseok Yang, Jan-Willem van de Meent
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods.
no code implementations • 15 Jun 2018 • Jin Xu, Yee Whye Teh
We develop a method for user-controllable semantic image inpainting: Given an arbitrary set of observed pixels, the unobserved pixels can be imputed in a user-controllable range of possibilities, each of which is semantically coherent and locally consistent with the observed pixels.
1 code implementation • NeurIPS 2018 • Adam R. Kosiorek, Hyunjik Kim, Ingmar Posner, Yee Whye Teh
It can reliably discover and track objects throughout the sequence of frames, and can also generate future frames conditioning on the current frame, thereby simulating expected motion of objects.
no code implementations • 5 Jun 2018 • Wojciech Marian Czarnecki, Siddhant M. Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Simon Osindero, Nicolas Heess, Razvan Pascanu
(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state.
1 code implementation • ICLR 2019 • Tuan Anh Le, Adam R. Kosiorek, N. Siddharth, Yee Whye Teh, Frank Wood
Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables.
no code implementations • ICML 2018 • Jonathan Schwarz, Jelena Luketina, Wojciech M. Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell
This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task.
no code implementations • NeurIPS 2018 • Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh
Discovering the causal structure among a set of variables is a fundamental problem in many areas of science.
1 code implementation • NeurIPS 2018 • Xenia Miscouridou, François Caron, Yee Whye Teh
We propose a novel class of network models for temporal dyadic interaction data.
no code implementations • 22 Feb 2018 • Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh
Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.
Distributional Reinforcement Learning reinforcement-learning +1
3 code implementations • ICML 2018 • Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator.
no code implementations • NeurIPS 2018 • Stefan Webb, Adam Golinski, Robert Zinkov, N. Siddharth, Tom Rainforth, Yee Whye Teh, Frank Wood
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently.
no code implementations • 20 Nov 2017 • Giuseppe Di Benedetto, François Caron, Yee Whye Teh
Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a power-law behavior, controlled by another parameter.
no code implementations • NeurIPS 2017 • Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu
Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.
no code implementations • 8 Jun 2017 • Hyunjik Kim, Yee Whye Teh
Automating statistical modelling is a challenging problem in artificial intelligence.
3 code implementations • NeurIPS 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, andriy mnih, Arnaud Doucet, Yee Whye Teh
When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results.
no code implementations • 16 Mar 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, andriy mnih, Yee Whye Teh
The policy gradients of the expected return objective can react slowly to rare rewards.
no code implementations • 22 Nov 2016 • Valerio Perrone, Paul A. Jenkins, Dario Spano, Yee Whye Teh
We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features.
1 code implementation • 11 Nov 2016 • Seth Flaxman, Danica J. Sutherland, Yu-Xiang Wang, Yee Whye Teh
We combine fine-grained spatially referenced census data with the vote outcomes from the 2016 US presidential election.
no code implementations • 7 Nov 2016 • Tamara Fernández, Yee Whye Teh
In this paper, we prove almost surely consistency of a Survival Analysis model, which puts a Gaussian process, mapped to the unit interval, as a prior on the so-called hazard function.
Statistics Theory Statistics Theory
5 code implementations • 2 Nov 2016 • Chris J. Maddison, andriy mnih, Yee Whye Teh
The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution.
no code implementations • 2 Nov 2016 • Lloyd T. Elliott, Yee Whye Teh
We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity.
no code implementations • NeurIPS 2016 • Tamara Fernández, Nicolás Rivera, Yee Whye Teh
We introduce a semi-parametric Bayesian model for survival analysis.
no code implementations • 27 Oct 2016 • Seth Flaxman, Yee Whye Teh, Dino Sejdinovic
However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem.
no code implementations • 14 Sep 2016 • Xiaoyu Lu, Valerio Perrone, Leonard Hasenclever, Yee Whye Teh, Sebastian J. Vollmer
Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo.
no code implementations • 7 Jul 2016 • Marco Battiston, Stefano Favaro, Daniel M. Roy, Yee Whye Teh
We characterize the class of exchangeable feature allocations assigning probability $V_{n, k}\prod_{l=1}^{k}W_{m_{l}}U_{n-m_{l}}$ to a feature allocation of $n$ individuals, displaying $k$ features with counts $(m_{1},\ldots, m_{k})$ for these features.
no code implementations • 6 Jul 2016 • Cian Naik, Francois Caron, Judith Rousseau, Yee Whye Teh, Konstantina Palla
In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks.
no code implementations • 16 Jun 2016 • Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh
We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel.
no code implementations • 23 May 2016 • Hyunjik Kim, Xiaoyu Lu, Seth Flaxman, Yee Whye Teh
We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression.
no code implementations • 15 Feb 2016 • Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh
Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics.
no code implementations • 31 Dec 2015 • Leonard Hasenclever, Stefan Webb, Thibaut Lienart, Sebastian Vollmer, Balaji Lakshminarayanan, Charles Blundell, Yee Whye Teh
The posterior server allows scalable and robust Bayesian learning in cases where a data set is stored in a distributed manner across a cluster, with each compute node containing a disjoint subset of data.
1 code implementation • 18 Jul 2015 • Matej Balog, Yee Whye Teh
We outline a slight adaptation of this algorithm to regression, as the remainder of the report uses regression as a case study of how Mondrian processes can be utilized in machine learning.
1 code implementation • NeurIPS 2015 • Thibaut Lienart, Yee Whye Teh, Arnaud Doucet
The computational complexity of our algorithm at each iteration is quadratic in the number of particles.
1 code implementation • 11 Jun 2015 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Many real-world regression problems demand a measure of the uncertainty associated with each prediction.
no code implementations • 16 Feb 2015 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression.
no code implementations • NeurIPS 2014 • Minjie Xu, Balaji Lakshminarayanan, Yee Whye Teh, Jun Zhu, Bo Zhang
We propose a distributed Markov chain Monte Carlo (MCMC) inference algorithm for large scale Bayesian posterior simulation.
no code implementations • 1 Sep 2014 • Yee Whye Teh, Alexandre Thiéry, Sebastian Vollmer
Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally expensive.
no code implementations • 18 Jul 2014 • Pablo G. Moreno, Yee Whye Teh, Fernando Perez-Cruz, Antonio Artés-Rodríguez
Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets.
no code implementations • 16 Jul 2014 • María Lomelí, Stefano Favaro, Yee Whye Teh
We investigate the class of $\sigma$-stable Poisson-Kingman random probability measures (RPMs) in the context of Bayesian nonparametric mixture modeling.
no code implementations • NeurIPS 2014 • Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh
We introduce a new sequential Monte Carlo algorithm we call the particle cascade.
2 code implementations • NeurIPS 2014 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics.
no code implementations • 31 May 2014 • Tue Herlau, Morten Mørup, Yee Whye Teh, Mikkel N. Schmidt
Bayesian mixture models are widely applied for unsupervised learning and exploratory data analysis.
no code implementations • NeurIPS 2013 • Charles Blundell, Yee Whye Teh
We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks.
no code implementations • NeurIPS 2013 • Xinhua Zhang, Wee Sun Lee, Yee Whye Teh
For the representer theorem to hold, the linear functionals are required to be bounded in the RKHS, and we show that this is true for a variety of commonly used RKHS and invariances.
no code implementations • NeurIPS 2013 • Sam Patterson, Yee Whye Teh
In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied online.
no code implementations • 30 Apr 2013 • Balaji Lakshminarayanan, Yee Whye Teh
A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators.
no code implementations • 3 Mar 2013 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Unlike classic decision tree learning algorithms like ID3, C4. 5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e. g., using Markov chain Monte Carlo (MCMC).
no code implementations • 21 Nov 2012 • François Caron, Yee Whye Teh, Thomas Brendan Murphy
In this paper we propose a Bayesian nonparametric model for clustering partial ranking data.
1 code implementation • 9 May 2012 • Arthur Asuncion, Max Welling, Padhraic Smyth, Yee Whye Teh
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data.
1 code implementation • ICML 2011 2011 • Max Welling, Yee Whye Teh
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches.