1 code implementation • 31 Oct 2023 • Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.
1 code implementation • 15 Oct 2023 • Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
We investigate the capabilities of transformer models on relational reasoning tasks.
no code implementations • NeurIPS 2023 • Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.
1 code implementation • 15 Jul 2022 • Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind
This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.
no code implementations • 10 Jun 2022 • Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind
While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination.
no code implementations • 28 Jan 2022 • Martin Bertran, Walter Talbott, Nitish Srivastava, Joshua Susskind
Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning.
2 code implementations • 17 May 2021 • Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
no code implementations • NeurIPS 2020 • Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan
Modern neural network performance typically improves as model size increases.