Search Results for author: Sainbayar Sukhbaatar

Found 38 papers, 19 papers with code

Reverse Training to Nurse the Reversal Curse

no code implementations • 20 Mar 2024 • Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse.

Paper
Add Code

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

1 code implementation • 12 Mar 2024 • Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Ranked #30 on Question Answering on TriviaQA

Arithmetic Reasoning Code Generation +6

207

Paper
Code

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations • 7 Mar 2024 • Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Paper
Add Code

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

no code implementations • 21 Feb 2024 • Lucas Lehnert, Sainbayar Sukhbaatar, Paul McVay, Michael Rabbat, Yuandong Tian

In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than standard $A^*$ search.

Decision Making

Paper
Add Code

Self-Rewarding Language Models

2 code implementations • 18 Jan 2024 • Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modelling

1,234

Paper
Code

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss

no code implementations • 27 Dec 2023 • Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

Practitioners commonly align large language models using pairwise preferences, i. e., given labels of the type response A is preferred to response B for a given input.

Paper
Add Code

System 2 Attention (is something you might need too)

no code implementations • 20 Nov 2023 • Jason Weston, Sainbayar Sukhbaatar

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations.

Math

Paper
Add Code

A Data Source for Reasoning Embodied Agents

1 code implementation • 14 Sep 2023 • Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent.

Paper
Code

Improving Open Language Models by Learning from Organic Interactions

no code implementations • 7 Jun 2023 • Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety.

Paper
Add Code

Large Language Model Programs

no code implementations • 9 May 2023 • Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.

Language Modelling Large Language Model +1

Paper
Add Code

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

no code implementations • 18 Apr 2023 • Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar

The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.

Language Modelling

Paper
Add Code

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Sentence +2

Paper
Add Code

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

Paper
Code

The CRINGE Loss: Learning what language not to model

no code implementations • 10 Nov 2022 • Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples.

Language Modelling

Paper
Add Code

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

no code implementations • 23 Jun 2022 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari

Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.

Continuous Control

Paper
Add Code

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

1 code implementation • 15 Jun 2022 • Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.

Language Modelling

10,426

Paper
Code

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations • 21 Mar 2022 • Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Paper
Add Code

Exploration-Driven Representation Learning in Reinforcement Learning

no code implementations • ICML Workshop URL 2021 • Akram Erraqabi, Mingde Zhao, Marlos C. Machado, Yoshua Bengio, Sainbayar Sukhbaatar, Ludovic Denoyer, Alessandro Lazaric

In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Hash Layers For Large Sparse Models

no code implementations • NeurIPS 2021 • Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Paper
Add Code

Staircase Attention for Recurrent Processing of Sequences

1 code implementation • 8 Jun 2021 • Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.

Language Modelling

136

Paper
Code

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation • 13 May 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Ranked #4 on Language Modelling on enwik8

Language Modelling

136

Paper
Code

Memory-Augmented Reinforcement Learning for Image-Goal Navigation

1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari

In this work, we present a memory-augmented approach for image-goal navigation.

Data Augmentation Navigate +2

Paper
Code

Not All Memories are Created Equal: Learning to Expire

1 code implementation • 1 Jan 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

Paper
Code

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

no code implementations • 10 Apr 2020 • Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.

Navigate Position

Paper
Add Code

Addressing Some Limitations of Transformers with Feedback Memory

4 code implementations • 21 Feb 2020 • Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar

Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.

Ranked #5 on Language Modelling on Penn Treebank (Character Level)

Language Modelling Machine Translation +1

47,992

Paper
Code

Augmenting Self-attention with Persistent Memory

2 code implementations • 2 Jul 2019 • Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.

Ranked #5 on Language Modelling on Text8

Language Modelling Translation

4,124

Paper
Code

Training Hybrid Language Models by Marginalizing over Segmentations

no code implementations • ACL 2019 • Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,

In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.

Language Modelling

Paper
Add Code

Adaptive Attention Span in Transformers

7 code implementations • ACL 2019 • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin

We propose a novel self-attention mechanism that can learn its optimal attention span.

Ranked #4 on Language Modelling on Text8

8k Language Modelling

606

Paper
Code

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

3 code implementations • ICLR 2019 • Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar

Learning when to communicate and doing that effectively is essential in multi-agent tasks.

Starcraft

202

Paper
Code

Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

2 code implementations • 22 Nov 2018 • Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus

In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

Planning with Arithmetic and Geometric Attributes

no code implementations • 6 Sep 2018 • David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.

Paper
Add Code

Composable Planning with Attributes

no code implementations • ICML 2018 • Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam

The tasks that an agent will need to solve often are not known during training.

Attribute Starcraft

Paper
Add Code

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

3 code implementations • ICLR 2018 • Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.