Search Results for author: Eric Hambro

Found 11 papers, 8 papers with code

Know When To Stop: A Study of Semantic Drift in Text Generation

no code implementations • 8 Apr 2024 • Ava Spataru, Eric Hambro, Elena Voita, Nicola Cancedda

Overall, our methods generalize and can be applied to any long-form text generation to produce more reliable information, by balancing trade-offs between factual accuracy, information quantity and computational cost.

Paper
Add Code

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations • 7 Mar 2024 • Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Paper
Add Code

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Paper
Add Code

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation • 6 Dec 2023 • Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

451

Paper
Code

Understanding the Effects of RLHF on LLM Generalisation and Diversity

1 code implementation • 10 Oct 2023 • Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.

Instruction Following

Paper
Code

LLaMA: Open and Efficient Foundation Language Models

44 code implementations • arXiv 2023 • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.

Ranked #3 on Question Answering on OBQA

Arithmetic Reasoning Code Generation +6

124,984

Paper
Code

Dungeons and Data: A Large-Scale NetHack Dataset

1 code implementation • 1 Nov 2022 • Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, Naila Murray

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets.

Decision Making NetHack +2

932

Paper
Code

Insights From the NeurIPS 2021 NetHack Challenge

1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.

NetHack Reinforcement Learning (RL)

Paper
Code

moolib: A Platform for Distributed RL

1 code implementation • 26 Jan 2022 • Vegard Mella, Eric Hambro, Danielle Rothermel, Heinrich Küttler

Together with the moolib library, we present example user code which shows how moolib’s components can be used to implement common reinforcement learning agents as a simple but scalable distributed network of homogeneous peers.

reinforcement-learning Reinforcement Learning (RL)

365

Paper
Code

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.

NetHack reinforcement-learning +2

451

Paper
Code

GPflux: A Library for Deep Gaussian Processes

1 code implementation • 12 Apr 2021 • Vincent Dutordoir, Hugh Salimbeni, Eric Hambro, John McLeod, Felix Leibfried, Artem Artemev, Mark van der Wilk, James Hensman, Marc P. Deisenroth, ST John

GPflux is compatible with and built on top of the Keras deep learning eco-system.

Gaussian Processes

118

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.