Search Results for author: Robert Kirk

Found 10 papers, 6 papers with code

Leading the Pack: N-player Opponent Shaping

no code implementations19 Dec 2023 Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning.

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation6 Dec 2023 Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

no code implementations21 Nov 2023 Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.

Network Pruning

Understanding the Effects of RLHF on LLM Generalisation and Diversity

1 code implementation10 Oct 2023 Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.

Instruction Following

Reward Model Ensembles Help Mitigate Overoptimization

1 code implementation4 Oct 2023 Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.

Model Optimization

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

no code implementations27 Nov 2022 Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin.

Domain Generalization Offline RL +2

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

1 code implementation31 May 2022 Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette

In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation.

Atari Games counterfactual +2

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

1 code implementation27 Sep 2021 Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.

NetHack reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.