2 code implementations • 19 Nov 2024 • Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwarak Talupuru, Acyr Locatelli, Robert Kirk, Tim Rocktäschel, Edward Grefenstette, Max Bartolo
We find that, while the models rely on mostly distinct sets of data for each factual question, a document often has a similar influence across different reasoning questions within the same task, indicating the presence of procedural knowledge.
no code implementations • 19 Sep 2024 • Eduardo Pignatelli, Johan Ferret, Tim Rockäschel, Edward Grefenstette, Davide Paglieri, Samuel Coward, Laura Toni
The temporal credit assignment problem is a central challenge in Reinforcement Learning (RL), concerned with attributing the appropriate influence to each actions in a trajectory for their ability to achieve a goal.
1 code implementation • 9 Feb 2024 • Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez
In anticipation of this, we ask: can weaker models assess the correctness of stronger models?
no code implementations • 19 Dec 2023 • Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel
We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning.
no code implementations • 19 Dec 2023 • Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.
no code implementations • 5 Dec 2023 • Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian
However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges.
1 code implementation • 21 Nov 2023 • Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel
This compute requirement is a major obstacle to rapid innovation for the field.
no code implementations • 21 Nov 2023 • Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.
1 code implementation • 10 Oct 2023 • Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu
OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.
no code implementations • 30 Mar 2023 • Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly.
1 code implementation • 24 Mar 2023 • Yicheng Luo, Zhengyao Jiang, samuel cohen, Edward Grefenstette, Marc Peter Deisenroth
In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations.
no code implementations • 15 Nov 2022 • Minqi Jiang, Tim Rocktäschel, Edward Grefenstette
We are at the cusp of a transition from "learning from data" to "learning what data to learn from" as a central focus of artificial intelligence (AI) research.
1 code implementation • NeurIPS 2023 • Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette
We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.
no code implementations • 23 Oct 2022 • Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette
We then present CASCADE, a novel approach for self-supervised exploration in this new setting.
1 code implementation • 30 Sep 2022 • Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel
Recent work has shown that augmenting environments with language descriptions improves policy learning.
1 code implementation • 22 Aug 2022 • Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
Planning-based reinforcement learning has shown strong performance in tasks in discrete and low-dimensional continuous action spaces.
2 code implementations • 23 Jul 2022 • Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel
In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards.
1 code implementation • 11 Jul 2022 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.
1 code implementation • 31 May 2022 • Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette
In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation.
1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.
3 code implementations • 2 Mar 2022 • Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
1 code implementation • 17 Feb 2022 • Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse.
no code implementations • 18 Nov 2021 • Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel
This survey is an overview of this nascent field.
4 code implementations • NeurIPS 2021 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.
no code implementations • 29 Sep 2021 • Jack Parker-Holder, Minqi Jiang, Michael D Dennis, Mikayel Samvelyan, Jakob Nicolaus Foerster, Edward Grefenstette, Tim Rocktäschel
Deep Reinforcement Learning (RL) has recently produced impressive results in a series of settings such as games and robotics.
1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel
By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.
no code implementations • NeurIPS Workshop ICBINB 2021 • Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette
Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels.
4 code implementations • 8 Oct 2020 • Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning.
2 code implementations • ICML 2020 • Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel
Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs).
Ranked #1 on
Relational Reasoning
on CLUTRR (k=3)
3 code implementations • NeurIPS 2020 • Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel
Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack.
Ranked #1 on
NetHack Score
on NetHack Learning Environment
5 code implementations • ICLR 2021 • Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette
A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards.
no code implementations • ICLR 2020 • Victor Zhong, Tim Rocktäschel, Edward Grefenstette
In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments.
3 code implementations • 17 Dec 2019 • Pasquale Minervini, Matko Bošnjak, Tim Rocktäschel, Sebastian Riedel, Edward Grefenstette
Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering.
Ranked #3 on
Link Prediction
on FB122
1 code implementation • 18 Oct 2019 • Victor Zhong, Tim Rocktäschel, Edward Grefenstette
In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments.
3 code implementations • 8 Oct 2019 • Heinrich Küttler, Nantas Nardelli, Thibaut Lavril, Marco Selvatici, Viswanath Sivakumar, Tim Rocktäschel, Edward Grefenstette
TorchBeast is a platform for reinforcement learning (RL) research in PyTorch.
3 code implementations • 3 Oct 2019 • Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala
Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem.
no code implementations • 25 Sep 2019 • Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, Franziska Meier
We present a meta-learning method for learning parametric loss functions that can generalize across different tasks and model architectures.
1 code implementation • 12 Jun 2019 • Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, Franziska Meier
This information shapes the learned loss function such that the environment does not need to provide this information during meta-test time.
no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.
no code implementations • ICLR 2019 • Pasquale Minervini, Matko Bosnjak, Tim Rocktäschel, Edward Grefenstette, Sebastian Riedel
Reasoning over text and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering.
no code implementations • CVPR 2019 • Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli
This behavior can have severe consequences such as usage of increased computation and induce faults in downstream modules that expect outputs of a certain length.
7 code implementations • ICLR 2019 • David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.
Ranked #2 on
Question Answering
on Mathematics Dataset
3 code implementations • 4 Dec 2018 • Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia
We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data.
no code implementations • ICLR 2019 • Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli
We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model.
1 code implementation • ICLR 2019 • Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette
Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.
no code implementations • ICLR 2018 • Richard Evans, David Saxton, David Amos, Pushmeet Kohli, Edward Grefenstette
We introduce a new dataset of logical entailments for the purpose of measuring models' ability to capture and exploit the structure of logical expressions against an entailment prediction task.
2 code implementations • TACL 2018 • Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette
Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document.
Ranked #9 on
Question Answering
on NarrativeQA
(BLEU-1 metric)
3 code implementations • 13 Nov 2017 • Richard Evans, Edward Grefenstette
Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised.
no code implementations • ACL 2017 • Xiaodan Zhu, Edward Grefenstette
Learning representation to model the meaning of text has been a core problem in NLP.
1 code implementation • ICML 2017 • Yishu Miao, Edward Grefenstette, Phil Blunsom
Topic models have been widely explored as probabilistic generative models of documents.
Ranked #2 on
Topic Models
on 20NewsGroups
no code implementations • ICLR 2018 • Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare.
Ranked #48 on
Question Answering
on SQuAD1.1 dev
no code implementations • 28 Nov 2016 • Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling
We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences.
no code implementations • 8 Nov 2016 • Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky
We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models.
no code implementations • EMNLP 2016 • Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann
We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing.
2 code implementations • ACL 2016 • Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom
Many language generation tasks require the production of text conditioned on both structured and unstructured inputs.
Ranked #10 on
Code Generation
on Django
7 code implementations • 22 Sep 2015 • Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom
We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases.
Ranked #83 on
Natural Language Inference
on SNLI
11 code implementations • NeurIPS 2015 • Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom
Teaching machines to read natural language documents remains an elusive challenge.
Ranked #13 on
Question Answering
on CNN / Daily Mail
4 code implementations • NeurIPS 2015 • Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom
Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems.
no code implementations • 15 Nov 2014 • Jianpeng Cheng, Dimitri Kartsaklis, Edward Grefenstette
This paper aims to explore the effect of prior disambiguation on neural network- based compositional models, with the hope that better semantic representations for text compounds can be produced.
no code implementations • WS 2014 • Edward Grefenstette, Phil Blunsom, Nando de Freitas, Karl Moritz Hermann
Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries.
6 code implementations • ACL 2014 • Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom
The ability to accurately represent sentences is central to language understanding.
no code implementations • 6 Nov 2013 • Edward Grefenstette
This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations.
no code implementations • 10 Jun 2013 • Karl Moritz Hermann, Edward Grefenstette, Phil Blunsom
With the increasing empirical success of distributional models of compositional semantics, it is timely to consider the types of textual logic that such models are capable of capturing.
no code implementations • 2 May 2013 • Stephen Clark, Bob Coecke, Edward Grefenstette, Stephen Pulman, Mehrnoosh Sadrzadeh
We discuss an algorithm which produces the meaning of a sentence given meanings of its words, and its resemblance to quantum teleportation.
no code implementations • SEMEVAL 2013 • Edward Grefenstette
This paper seeks to bring this reconciliation one step further by showing how the mathematical constructs commonly used in compositional distributional models, such as tensors and matrices, can be used to simulate different aspects of predicate logic.
1 code implementation • 20 Jun 2011 • Edward Grefenstette, Mehrnoosh Sadrzadeh
The evaluation is based on the word disambiguation task developed by Mitchell and Lapata (2008) for intransitive sentences, and on a similar new experiment designed for transitive sentences.