Search Results for author: Roberta Raileanu

Found 32 papers, 24 papers with code

Fast Adaptation to New Environments via Policy-Dynamics Value Functions

no code implementations ICML 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Facebook Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations7 Mar 2024 Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations26 Feb 2024 Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

TOOLVERIFIER: Generalization to New Tools via Self-Verification

1 code implementation21 Feb 2024 Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu

Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem.

The Generalization Gap in Offline Reinforcement Learning

1 code implementation10 Dec 2023 Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu

Our experiments reveal that existing offline learning algorithms struggle to match the performance of online RL on both train and test environments.

Offline RL reinforcement-learning +1

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation6 Dec 2023 Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

Understanding the Effects of RLHF on LLM Generalisation and Diversity

1 code implementation10 Oct 2023 Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.

Instruction Following

Chain-of-Verification Reduces Hallucination in Large Language Models

1 code implementation20 Sep 2023 Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models.

Hallucination Text Generation

Challenges and Applications of Large Language Models

no code implementations19 Jul 2023 Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy

Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas.

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

2 code implementations5 Jun 2023 Mikael Henaff, Minqi Jiang, Roberta Raileanu

This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma's Revenge.

Montezuma's Revenge

Hyperparameters in Reinforcement Learning and How To Tune Them

1 code implementation2 Jun 2023 Theresa Eimer, Marius Lindauer, Roberta Raileanu

In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting.

Hyperparameter Optimization reinforcement-learning +1

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

no code implementations6 Mar 2023 Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.

Continuous Control Multi-agent Reinforcement Learning +2

Building a Subspace of Policies for Scalable Continual Learning

1 code implementation18 Nov 2022 Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks.

Continual Learning

Dungeons and Data: A Large-Scale NetHack Dataset

1 code implementation1 Nov 2022 Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, Naila Murray

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets.

Decision Making NetHack +2

Exploration via Elliptical Episodic Bonuses

3 code implementations11 Oct 2022 Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel

In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes.

Reinforcement Learning (RL)

Decoupling Value and Policy for Generalization in Reinforcement Learning

2 code implementations20 Feb 2021 Roberta Raileanu, Rob Fergus

Standard deep reinforcement learning algorithms use a shared representation for the policy and value function, especially when training directly from images.

reinforcement-learning Reinforcement Learning (RL)

Fast Adaptation via Policy-Dynamics Value Functions

1 code implementation6 Jul 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

The NetHack Learning Environment

3 code implementations NeurIPS 2020 Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack.

NetHack Score Reinforcement Learning (RL) +1

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

2 code implementations ICLR 2020 Roberta Raileanu, Tim Rocktäschel

However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once.

Backplay: 'Man muss immer umkehren'

no code implementations ICLR 2019 Cinjon Resnick, Roberta Raileanu, Sanyam Kapoor, Alexander Peysakhovich, Kyunghyun Cho, Joan Bruna

Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency.

Reinforcement Learning (RL)

Backplay: "Man muss immer umkehren"

1 code implementation18 Jul 2018 Cinjon Resnick, Roberta Raileanu, Sanyam Kapoor, Alexander Peysakhovich, Kyunghyun Cho, Joan Bruna

Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency.

Reinforcement Learning (RL)

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

1 code implementation ICML 2018 Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility.

Multi-agent Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.