Search Results for author: Alan Chan

Found 16 papers, 1 papers with code

Visibility into AI Agents

no code implementations23 Jan 2024 Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.

Informativeness

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

no code implementations22 Dec 2023 Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining.

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

no code implementations22 Oct 2023 Ross Gruetzemacher, Alan Chan, Kevin Frazier, Christy Manning, Štěpán Los, James Fox, José Hernández-Orallo, John Burden, Matija Franklin, Clíodhna Ní Ghuidhir, Mark Bailey, Daniel Eth, Toby Pilditch, Kyle Kilian

Given rapid progress toward advanced AI and risks from frontier AI systems (advanced AI systems pushing the boundaries of the AI capabilities frontier), the creation and implementation of AI governance and regulatory schemes deserves prioritization and substantial investment.

Welfare Diplomacy: Benchmarking Language Model Cooperation

1 code implementation13 Oct 2023 Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan, Jesse Clifton

The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities.

Benchmarking Language Modelling

Towards the Scalable Evaluation of Cooperativeness in Language Models

no code implementations16 Mar 2023 Alan Chan, Maxime Riché, Jesse Clifton

Since desired behaviour in an interaction depends upon precise game-theoretic structure, we focus on generating scenarios with particular structures with both crowdworkers and a language model.

Language Modelling

Scoring Rules for Performative Binary Prediction

no code implementations5 Jul 2022 Alan Chan

We construct a model of expert prediction where predictions can influence the state of the world.

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Parameter-free Gradient Temporal Difference Learning

no code implementations10 May 2021 Andrew Jacobsen, Alan Chan

In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored.

reinforcement-learning Reinforcement Learning (RL)

Incremental Policy Gradients for Online Reinforcement Learning Control

no code implementations1 Jan 2021 Kristopher De Asis, Alan Chan, Yi Wan, Richard S. Sutton

Our emphasis is on the first approach in this work, detailing an incremental policy gradient update which neither waits until the end of the episode, nor relies on learning estimates of the return.

Policy Gradient Methods reinforcement-learning +1

Inverse Policy Evaluation for Value-based Sequential Decision-making

no code implementations26 Aug 2020 Alan Chan, Kris de Asis, Richard S. Sutton

In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function.

Decision Making Q-Learning

Training Recurrent Neural Networks Online by Learning Explicit State Variables

no code implementations ICLR 2020 Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

no code implementations9 Sep 2019 Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.

Q-Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.