Search Results for author: David Krueger

Found 41 papers, 20 papers with code

Safety Cases: How to Justify the Safety of Advanced AI Systems

no code implementations15 Mar 2024 Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen

To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe.

A Generative Model of Symmetry Transformations

no code implementations4 Mar 2024 James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard E. Turner, Eric Nalisnick, José Miguel Hernández-Lobato

Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge.

Visibility into AI Agents

no code implementations23 Jan 2024 Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.

Informativeness

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

no code implementations22 Dec 2023 Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining.

Meta- (out-of-context) learning in neural networks

1 code implementation23 Oct 2023 Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs).

In-Context Learning

Reward Model Ensembles Help Mitigate Overoptimization

1 code implementation4 Oct 2023 Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.

Model Optimization

Thinker: Learning to Plan and Act

1 code implementation NeurIPS 2023 Stephen Chung, Ivan Anokhin, David Krueger

This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization.

Investigating the Nature of 3D Generalization in Deep Neural Networks

1 code implementation19 Apr 2023 Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel

Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood.

Object Recognition

Unifying Grokking and Double Descent

1 code implementation10 Mar 2023 Xander Davies, Lauro Langosco, David Krueger

A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework.

On The Fragility of Learned Reward Functions

no code implementations9 Jan 2023 Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.

Continuous Control

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

no code implementations27 Nov 2022 Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin.

Domain Generalization Offline RL +2

Mechanistic Mode Connectivity

1 code implementation15 Nov 2022 Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.

Broken Neural Scaling Laws

1 code implementation26 Oct 2022 Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic.

Adversarial Robustness Continual Learning +8

Towards Out-of-Distribution Adversarial Robustness

1 code implementation6 Oct 2022 Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.

Adversarial Robustness

Defining and Characterizing Reward Hacking

no code implementations27 Sep 2022 Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.

Revealing the Incentive to Cause Distributional Shift

no code implementations29 Sep 2021 David Krueger, Tegan Maharaj, Jan Leike

We use these unit tests to demonstrate that changes to the learning algorithm (e. g. introducing meta-learning) can cause previously hidden incentives to be revealed, resulting in qualitatively different behaviour despite no change in performance metric.

Meta-Learning

Goal Misgeneralization in Deep Reinforcement Learning

4 code implementations28 May 2021 Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).

Navigate Out-of-Distribution Generalization +2

Active Reinforcement Learning: Observing Rewards at a Cost

no code implementations13 Nov 2020 David Krueger, Jan Leike, Owain Evans, John Salvatier

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.

Multi-Armed Bandits reinforcement-learning +1

Hidden Incentives for Auto-Induced Distributional Shift

no code implementations19 Sep 2020 David Krueger, Tegan Maharaj, Jan Leike

We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.

BIG-bench Machine Learning Meta-Learning +1

AI Research Considerations for Human Existential Safety (ARCHES)

no code implementations30 May 2020 Andrew Critch, David Krueger

Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species.

Scalable agent alignment via reward modeling: a research direction

3 code implementations19 Nov 2018 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning +1

Neural Autoregressive Flows

5 code implementations ICML 2018 Chin-wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF).

Density Estimation Speech Synthesis

Nested LSTMs

1 code implementation31 Jan 2018 Joel Ruben Antony Moniz, David Krueger

We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory.

Language Modelling

Deep Prior

no code implementations13 Dec 2017 Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger

The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds.

Regularizing RNNs by Stabilizing Activations

1 code implementation26 Nov 2015 David Krueger, Roland Memisevic

We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms.

Language Modelling

NICE: Non-linear Independent Components Estimation

20 code implementations30 Oct 2014 Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

Zero-bias autoencoders and the benefits of co-adapting features

no code implementations13 Feb 2014 Kishore Konda, Roland Memisevic, David Krueger

We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation.

Cannot find the paper you are looking for? You can Submit a new open access paper.