You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 28 Mar 2022 • Berfin Simsek, Melissa Hall, Levent Sagun

Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imagenet}.

1 code implementation • 16 Feb 2022 • Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.

Ranked #1 on Copy Detection on Copydays strong subset (using extra training data)

1 code implementation • 15 Feb 2022 • Priya Goyal, Adriana Romero Soriano, Caner Hazirbas, Levent Sagun, Nicolas Usunier

Systematic diagnosis of fairness, harms, and biases of computer vision systems is an important step towards building socially responsible systems.

no code implementations • 10 Jun 2021 • Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos

Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.

4 code implementations • 19 Mar 2021 • Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.

Ranked #261 on Image Classification on ImageNet

1 code implementation • NeurIPS 2021 • Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli

One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well.

no code implementations • 25 Jun 2020 • Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019.

1 code implementation • NeurIPS 2020 • Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

We show that this peak is implicitly regularized by the nonlinearity, which is why it only becomes salient at high noise and is weakly affected by explicit regularization.

no code implementations • 29 Nov 2019 • Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.

no code implementations • NeurIPS 2019 • Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli

The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.

1 code implementation • 18 Jan 2019 • Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.

1 code implementation • 6 Jan 2019 • Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

At this threshold, we argue that $\|f_{N}\|$ diverges.

no code implementations • 22 Oct 2018 • Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved.

2 code implementations • 25 Sep 2018 • Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart

In the vicinity of this transition, properties of the curvature of the minima of the loss are critical.

no code implementations • ICML 2018 • Marco Baity-Jesi, Levent Sagun, Mario Geiger, Stefano Spigler, Gerard Ben Arous, Chiara Cammarota, Yann Lecun, Matthieu Wyart, Giulio Biroli

We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems.

no code implementations • ICLR 2018 • Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

3 code implementations • 18 Apr 2017 • Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho

We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering.

no code implementations • 23 Mar 2017 • Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales

Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences.

no code implementations • 22 Nov 2016 • Levent Sagun, Leon Bottou, Yann Lecun

We look at the eigenvalues of the Hessian of a loss function before and after training.

2 code implementations • 6 Nov 2016 • Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann Lecun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape.

no code implementations • 19 Nov 2015 • Levent Sagun, Thomas Trogdon, Yann Lecun

Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed.

no code implementations • 20 Dec 2014 • Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann Lecun

Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.