Search Results for author: Sainbayar Sukhbaatar

Found 38 papers, 19 papers with code

Reverse Training to Nurse the Reversal Curse

no code implementations20 Mar 2024 Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse.

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

1 code implementation12 Mar 2024 Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Arithmetic Reasoning Code Generation +6

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations7 Mar 2024 Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

no code implementations21 Feb 2024 Lucas Lehnert, Sainbayar Sukhbaatar, Paul McVay, Michael Rabbat, Yuandong Tian

In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than standard $A^*$ search.

Decision Making

Self-Rewarding Language Models

2 code implementations18 Jan 2024 Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modelling

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss

no code implementations27 Dec 2023 Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

Practitioners commonly align large language models using pairwise preferences, i. e., given labels of the type response A is preferred to response B for a given input.

System 2 Attention (is something you might need too)

no code implementations20 Nov 2023 Jason Weston, Sainbayar Sukhbaatar

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations.

Math

A Data Source for Reasoning Embodied Agents

1 code implementation14 Sep 2023 Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent.

Improving Open Language Models by Learning from Organic Interactions

no code implementations7 Jun 2023 Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety.

Large Language Model Programs

no code implementations9 May 2023 Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.

Language Modelling Large Language Model +1

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

no code implementations18 Apr 2023 Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar

The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.

Language Modelling

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations16 Feb 2023 Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Sentence +2

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation5 Jan 2023 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

The CRINGE Loss: Learning what language not to model

no code implementations10 Nov 2022 Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples.

Language Modelling

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

no code implementations23 Jun 2022 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari

Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.

Continuous Control

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

1 code implementation15 Jun 2022 Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.

Language Modelling

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations21 Mar 2022 Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Hash Layers For Large Sparse Models

no code implementations NeurIPS 2021 Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Staircase Attention for Recurrent Processing of Sequences

1 code implementation8 Jun 2021 Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.

Language Modelling

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation13 May 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Language Modelling

Not All Memories are Created Equal: Learning to Expire

1 code implementation1 Jan 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

no code implementations10 Apr 2020 Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.

Navigate Position

Augmenting Self-attention with Persistent Memory

2 code implementations2 Jul 2019 Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.

Language Modelling Translation

Training Hybrid Language Models by Marginalizing over Segmentations

no code implementations ACL 2019 Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,

In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.

Language Modelling

Planning with Arithmetic and Geometric Attributes

no code implementations6 Sep 2018 David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

3 code implementations ICLR 2018 Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

MazeBase: A Sandbox for Learning from Games

2 code implementations23 Nov 2015 Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.

Negation Reinforcement Learning (RL) +1

Training Convolutional Networks with Noisy Labels

no code implementations9 Jun 2014 Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, Rob Fergus

The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.