Search Results for author: Adam Gleave

Found 19 papers, 12 papers with code

Uncovering Latent Human Wellbeing in Language Model Embeddings

no code implementations • 19 Feb 2024 • Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons

Do language models implicitly learn a concept of human wellbeing?

Paper
Add Code

Exploiting Novel GPT-4 APIs

1 code implementation • 21 Dec 2023 • Kellin Pelrine, Mohammad Taufeeque, Michał Zając, Euan McLean, Adam Gleave

Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API.

Language Modelling Retrieval +1

Paper
Code

STARC: A General Framework For Quantifying Differences Between Reward Functions

no code implementations • 26 Sep 2023 • Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance.

Paper
Add Code

On The Fragility of Learned Reward Functions

no code implementations • 9 Jan 2023 • Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.

Continuous Control

Paper
Add Code

imitation: Clean Imitation Learning Implementations

2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.

Imitation Learning reinforcement-learning +1

1,136

Paper
Code

Adversarial Policies Beat Superhuman Go AIs

2 code implementations • 1 Nov 2022 • Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack.

Paper
Code

Calculus on MDPs: Potential Shaping as a Gradient

no code implementations • 20 Aug 2022 • Erik Jenner, Herke van Hoof, Adam Gleave

In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce.

Math

Paper
Add Code

Reducing Exploitability with Population Based Training

1 code implementation • 10 Aug 2022 • Pavel Czempin, Adam Gleave

Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games.

Paper
Code

Preprocessing Reward Functions for Interpretability

1 code implementation • 25 Mar 2022 • Erik Jenner, Adam Gleave

In many real-world applications, the reward function is too complex to be manually specified.

Paper
Code

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning

no code implementations • 22 Mar 2022 • Adam Gleave, Sam Toyer

Inverse Reinforcement Learning (IRL) algorithms infer a reward function that explains demonstrations provided by an expert acting in the environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Uncertainty Estimation for Language Reward Models

no code implementations • 14 Mar 2022 • Adam Gleave, Geoffrey Irving

However, to solve a particular problem (such as text summarization) it is typically necessary to fine-tune them on a task-specific dataset.

Active Learning Reinforcement Learning (RL) +1

Paper
Add Code

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

no code implementations • 14 Mar 2022 • Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave

It is often very challenging to manually design reward functions for complex, real-world tasks.

Paper
Add Code

Understanding Learned Reward Functions

1 code implementation • 10 Dec 2020 • Eric J. Michaud, Adam Gleave, Stuart Russell

However, current techniques for reward learning may fail to produce reward functions which accurately reflect user preferences.

Paper
Code

DERAIL: Diagnostic Environments for Reward And Imitation Learning

2 code implementations • 2 Dec 2020 • Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell

We evaluate a range of common reward and imitation learning algorithms on our tasks.

Imitation Learning

Paper
Code

Quantifying Differences in Reward Functions

1 code implementation • ICLR 2021 • Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Paper
Code

Adversarial Policies: Attacking Deep Reinforcement Learning

2 code implementations • ICLR 2020 • Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell

Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers.

reinforcement-learning Reinforcement Learning (RL)

264

Paper
Code

Inverse reinforcement learning for video games

1 code implementation • 24 Oct 2018 • Aaron Tucker, Adam Gleave, Stuart Russell

Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function.

Continuous Control reinforcement-learning +1

Paper
Code

Active Inverse Reward Design

1 code implementation • 9 Sep 2018 • Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell

We propose structuring this process as a series of queries asking the user to compare between different reward functions.

Informativeness

Paper
Code

Multi-task Maximum Entropy Inverse Reinforcement Learning

1 code implementation • 22 May 2018 • Adam Gleave, Oliver Habryka

Multi-task Inverse Reinforcement Learning (IRL) is the problem of inferring multiple reward functions from expert demonstrations.

Imitation Learning Meta-Learning +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.