Search Results for author: Omid Saremi

Found 8 papers, 2 papers with code

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

no code implementations • 7 Dec 2023 • Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin

In this paper, we introduce LiDAR (Linear Discriminant Analysis Rank), a metric designed to measure the quality of representations within JE architectures.

Paper
Add Code

Vanishing Gradients in Reinforcement Finetuning of Language Models

1 code implementation • 31 Oct 2023 • Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.

Paper
Code

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Paper
Add Code

When can transformers reason with abstract symbols?

1 code implementation • 15 Oct 2023 • Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind

We investigate the capabilities of transformer models on relational reasoning tasks.

Relational Reasoning

Paper
Code

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

no code implementations • 13 Oct 2023 • Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).

Retrieval

Paper
Add Code

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

no code implementations • 10 Jun 2022 • Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind

While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination.

Inductive Bias

Paper
Add Code

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

no code implementations • 2 Jul 2021 • Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi, Joshua M. Susskind

Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization.

Paper
Add Code

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

no code implementations • 1 Jul 2021 • Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.