no code implementations • 20 Mar 2024 • Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar
Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse.
1 code implementation • 12 Mar 2024 • Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.
Ranked #30 on Question Answering on TriviaQA
no code implementations • 7 Mar 2024 • Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu
Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.
no code implementations • 21 Feb 2024 • Lucas Lehnert, Sainbayar Sukhbaatar, Paul McVay, Michael Rabbat, Yuandong Tian
In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than standard $A^*$ search.
2 code implementations • 18 Jan 2024 • Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston
We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.
no code implementations • 27 Dec 2023 • Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston
Practitioners commonly align large language models using pairwise preferences, i. e., given labels of the type response A is preferred to response B for a given input.
no code implementations • 20 Nov 2023 • Jason Weston, Sainbayar Sukhbaatar
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations.
1 code implementation • 14 Sep 2023 • Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam
In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent.
no code implementations • 7 Jun 2023 • Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster
We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety.
no code implementations • 9 May 2023 • Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.
no code implementations • 18 Apr 2023 • Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar
The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.
no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.
1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari
Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.
no code implementations • 10 Nov 2022 • Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston
Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples.
no code implementations • 23 Jun 2022 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari
Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.
1 code implementation • 15 Jun 2022 • Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.
no code implementations • 21 Mar 2022 • Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.
no code implementations • ICML Workshop URL 2021 • Akram Erraqabi, Mingde Zhao, Marlos C. Machado, Yoshua Bengio, Sainbayar Sukhbaatar, Ludovic Denoyer, Alessandro Lazaric
In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space.
no code implementations • NeurIPS 2021 • Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston
We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.
1 code implementation • 8 Jun 2021 • Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston
Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.
1 code implementation • 13 May 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.
Ranked #4 on Language Modelling on enwik8
1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari
In this work, we present a memory-augmented approach for image-goal navigation.
1 code implementation • 1 Jan 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.
no code implementations • 10 Apr 2020 • Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski
Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.
4 code implementations • 21 Feb 2020 • Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.
2 code implementations • 2 Jul 2019 • Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #5 on Language Modelling on Text8
no code implementations • ACL 2019 • Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,
In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.
7 code implementations • ACL 2019 • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin
We propose a novel self-attention mechanism that can learn its optimal attention span.
Ranked #4 on Language Modelling on Text8
3 code implementations • ICLR 2019 • Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar
Learning when to communicate and doing that effectively is essential in multi-agent tasks.
2 code implementations • 22 Nov 2018 • Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus
In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • 6 Sep 2018 • David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna
A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.
no code implementations • ICML 2018 • Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam
The tasks that an agent will need to solve often are not known during training.
3 code implementations • ICLR 2018 • Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus
When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.
9 code implementations • NeurIPS 2016 • Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
Many tasks in AI require the collaboration of multiple agents.
7 code implementations • 7 Dec 2015 • Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
We describe a very simple bag-of-words baseline for visual question answering.
2 code implementations • 23 Nov 2015 • Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus
This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.
44 code implementations • NeurIPS 2015 • Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
For the former our approach is competitive with Memory Networks, but with less supervision.
Ranked #6 on Question Answering on bAbi
no code implementations • 9 Jun 2014 • Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, Rob Fergus
The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results.