Search Results for author: Chris J. Maddison

Found 34 papers, 22 papers with code

Observational Scaling Laws and the Predictability of Language Model Performance

no code implementations17 May 2024 Yangjun Ruan, Chris J. Maddison, Tatsunori Hashimoto

However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities.

Language Modelling

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

no code implementations13 Feb 2024 Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison

Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions.

Image Classification Language Modelling +1

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation25 Sep 2023 Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions

no code implementations4 Oct 2022 Daniel D. Johnson, Ayoub El Hanchi, Chris J. Maddison

We give generalization bounds for downstream linear prediction using our Kernel PCA representation, and show empirically on a set of synthetic tasks that applying Kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength.

Contrastive Learning Generalization Bounds

Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

no code implementations27 Jun 2022 Max B. Paulus, Giulia Zarpellon, Andreas Krause, Laurent Charlin, Chris J. Maddison

Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value.

Imitation Learning

Augment with Care: Contrastive Learning for Combinatorial Problems

1 code implementation17 Feb 2022 Haonan Duan, Pashootan Vaezipoor, Max B. Paulus, Yangjun Ruan, Chris J. Maddison

While typical graph contrastive pre-training uses label-agnostic augmentations, our key insight is that many combinatorial problems have well-studied invariances, which allow for the design of label-preserving augmentations.

Contrastive Learning

Bayesian Nonparametrics for Offline Skill Discovery

1 code implementation9 Feb 2022 Valentin Villecroze, Harry J. Braviner, Panteha Naderian, Chris J. Maddison, Gabriel Loaiza-Ganem

Skills or low-level policies in reinforcement learning are temporally extended actions that can speed up learning and enable complex behaviours.

Imitation Learning reinforcement-learning +2

Optimal Representations for Covariate Shift

2 code implementations ICLR 2022 Yangjun Ruan, Yann Dubois, Chris J. Maddison

Machine learning systems often experience a distribution shift between training and testing.

Ranked #38 on Image Classification on ObjectNet (using extra training data)

Domain Generalization Image Classification +1

Learning Generalized Gumbel-max Causal Mechanisms

1 code implementation NeurIPS 2021 Guy Lorberbom, Daniel D. Johnson, Chris J. Maddison, Daniel Tarlow, Tamir Hazan

To perform counterfactual reasoning in Structural Causal Models (SCMs), one needs to know the causal mechanisms, which provide factorizations of conditional distributions into noise sources and deterministic functions mapping realizations of noise to samples.

counterfactual Counterfactual Reasoning

Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts

no code implementations NeurIPS Workshop ICBINB 2021 Wouter Kool, Chris J. Maddison, andriy mnih

Training large-scale mixture of experts models efficiently on modern hardware requires assigning datapoints in a batch to different experts, each with a limited capacity.

Learning to Extend Program Graphs to Work-in-Progress Code

no code implementations28 May 2021 Xuechen Li, Chris J. Maddison, Daniel Tarlow

Source code spends most of its time in a broken or incomplete state during software development.

Code Completion Variable misuse

Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

1 code implementation ICLR Workshop Neural_Compression 2021 Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J. Maddison

Naively applied, our schemes would require more initial bits than the standard bits-back coder, but we show how to drastically reduce this additional cost with couplings in the latent space.

Data Compression

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

1 code implementation8 Feb 2021 Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables.

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

5 code implementations ICLR 2021 Max B. Paulus, Chris J. Maddison, Andreas Krause

Gradient estimation in models with discrete latent variables is a challenging problem, because the simplest unbiased estimators tend to have high variance.

Learning Branching Heuristics for Propositional Model Counting

no code implementations7 Jul 2020 Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Sanjit A. Seshia, Fahiem Bacchus

In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.

On Empirical Comparisons of Optimizers for Deep Learning

1 code implementation11 Oct 2019 Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl

In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent.


Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

no code implementations NeurIPS 2020 Guy Lorberbom, Chris J. Maddison, Nicolas Heess, Tamir Hazan, Daniel Tarlow

A main benefit of DirPG algorithms is that they allow the insertion of domain knowledge in the form of upper bounds on return-to-go at training time, like is used in heuristic search, while still directly computing a policy gradient.

Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders

4 code implementations NeurIPS 2019 Emile Mathieu, Charline Le Lan, Chris J. Maddison, Ryota Tomioka, Yee Whye Teh

We therefore endow VAEs with a Poincar\'e ball model of hyperbolic geometry as a latent space and rigorously derive the necessary methods to work with two main Gaussian generalisations on that space.

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

3 code implementations ICLR 2019 George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison

Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases.

Variational Inference

Hamiltonian Descent Methods

4 code implementations13 Sep 2018 Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex.

Tighter Variational Bounds are Not Necessarily Better

3 code implementations ICML 2018 Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh

We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator.

Filtering Variational Objectives

3 code implementations NeurIPS 2017 Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, andriy mnih, Arnaud Doucet, Yee Whye Teh

When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results.

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

5 code implementations2 Nov 2016 Chris J. Maddison, andriy mnih, Yee Whye Teh

The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution.

Density Estimation Structured Prediction

Move Evaluation in Go Using Deep Convolutional Neural Networks

1 code implementation20 Dec 2014 Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver

The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function.

Game of Go Position

A* Sampling

no code implementations NeurIPS 2014 Chris J. Maddison, Daniel Tarlow, Tom Minka

The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem.

Structured Generative Models of Natural Source Code

no code implementations2 Jan 2014 Chris J. Maddison, Daniel Tarlow

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans.

Annealing between distributions by averaging moments

no code implementations NeurIPS 2013 Roger B. Grosse, Chris J. Maddison, Ruslan R. Salakhutdinov

Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution.

Cannot find the paper you are looking for? You can Submit a new open access paper.