Search Results for author: Robert Kirk

Found 10 papers, 6 papers with code

Leading the Pack: N-player Opponent Shaping

no code implementations • 19 Dec 2023 • Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning.

Paper
Add Code

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation • 6 Dec 2023 • Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

451

Paper
Code

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

no code implementations • 21 Nov 2023 • Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.

Network Pruning

Paper
Add Code

Understanding the Effects of RLHF on LLM Generalisation and Diversity

1 code implementation • 10 Oct 2023 • Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.

Instruction Following

Paper
Code

Reward Model Ensembles Help Mitigate Overoptimization

1 code implementation • 4 Oct 2023 • Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.

Model Optimization

Paper
Code

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

no code implementations • 27 Nov 2022 • Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin.

Domain Generalization Offline RL +2

Paper
Add Code

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

1 code implementation • 31 May 2022 • Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette

In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation.

Atari Games counterfactual +2

Paper
Code

Insights From the NeurIPS 2021 NetHack Challenge

1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.

NetHack Reinforcement Learning (RL)

Paper
Code

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

no code implementations • 18 Nov 2021 • Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel

This survey is an overview of this nascent field.

Offline RL reinforcement-learning +1

Paper
Add Code

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.

NetHack reinforcement-learning +2

451

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.