no code implementations • ICML 2020 • Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

no code implementations • 3 Jul 2024 • Kaiying Hou, David Brandfonbrener, Sham Kakade, Samy Jelassi, Eran Malach

Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models.

no code implementations • 1 Jul 2024 • Ahmet Cagri Duzgun, Samy Jelassi, Yuanzhi Li

We first examine the expressivity of the features of these models, and show that the feature space of overparameterized networks cannot be spanned by concatenating many underparameterized features, and vice versa.

1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

2 code implementations • 1 Feb 2024 • Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.

no code implementations • 27 Jun 2023 • Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.

no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

no code implementations • 13 Oct 2022 • Samy Jelassi, Michael E. Sander, Yuanzhi Li

On the theoretical side, we consider a binary classification task and show that while the learning problem admits multiple solutions that generalize, our model implicitly learns the spatial structure of the dataset while generalizing: we call this phenomenon patch association.

no code implementations • 9 Oct 2022 • Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel

By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.

no code implementations • 13 Jul 2022 • Samy Jelassi, Yuanzhi Li

Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures.

no code implementations • 29 Sep 2021 • Samy Jelassi, Arthur Mensch, Gauthier Gidel, Yuanzhi Li

We empirically show that SGDA with the same vector norm as Adam reaches similar or even better performance than the latter.

no code implementations • 2 Feb 2021 • Luca Venturi, Samy Jelassi, Tristan Ozuch, Joan Bruna

The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory structure, by building on the proof strategy of (Eldan and Shamir, 2016).

5 code implementations • 26 Jan 2021 • Aaron Defazio, Samy Jelassi

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.

no code implementations • 20 Oct 2020 • Samy Jelassi, Aaron Defazio

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.

no code implementations • ICLR 2021 • Jad Rahme, Samy Jelassi, S. Matthew Weinberg

This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work).

1 code implementation • 2 Mar 2020 • Jad Rahme, Samy Jelassi, Joan Bruna, S. Matthew Weinberg

Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design.

no code implementations • NeurIPS 2020 • Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grant Rotskoff, Joan Bruna

Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.

1 code implementation • NeurIPS 2019 • Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).

1 code implementation • 29 May 2019 • Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

no code implementations • 5 Feb 2019 • Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden

Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models.

no code implementations • NeurIPS 2018 • Thomas Pumir, Samy Jelassi, Nicolas Boumal

In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.