no code implementations • ICLR 2022 • Raphael Gontijo-Lopes, Yann Dauphin, Ekin D. Cubuk
Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions.
no code implementations • 29 Sep 2021 • Jonas Ngnawé, MARIANNE NJIFON, Jonathan Heek, Yann Dauphin
Deep networks have achieved impressive results on a range of well curated benchmark datasets.
1 code implementation • ICLR 2021 • Lucio M. Dery, Yann Dauphin, David Grangier
In this case, careful consideration is needed to select tasks and model parameterizations such that updates from the auxiliary tasks actually help the primary task.
no code implementations • 26 Jul 2021 • Wojciech Sirko, Sergii Kashubin, Marvin Ritter, Abigail Annkah, Yasser Salah Eddine Bouchareb, Yann Dauphin, Daniel Keysers, Maxim Neumann, Moustapha Cisse, John Quinn
Identifying the locations and footprints of buildings is vital for many practical and scientific purposes.
no code implementations • ICLR 2021 • Yann Dauphin, Ekin Dogus Cubuk
Surprisingly, this simple mechanism matches the improvement of $0. 8\%$ of the more complex Dropout regularization for the state-of-the-art Efficientnet-B8 model on Imagenet.
no code implementations • 14 Oct 2020 • Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz
In this work we develop a theory of early learning for models trained with softmax-cross-entropy loss and show that the learning dynamics depend crucially on the inverse-temperature $\beta$ as well as the magnitude of the logits at initialization, $||\beta{\bf z}||_{2}$.
1 code implementation • 7 Oct 2020 • Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin
Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training.
no code implementations • ECCV 2020 • Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma
We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.
Ranked #34 on
Image Classification
on mini WebVision 1.0
2 code implementations • 13 Nov 2019 • Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome
However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.
no code implementations • 25 Sep 2019 • Sara Hooker, Yann Dauphin, Aaron Courville, Andrea Frome
Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to top-1 test set accuracy.
no code implementations • ICLR 2019 • Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin
Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.
1 code implementation • 12 Mar 2019 • Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin
How do we know if communication is emerging in a multi-agent system?
no code implementations • ACL 2019 • Angela Fan, Mike Lewis, Yann Dauphin
Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right.
7 code implementations • ACL 2018 • Angela Fan, Mike Lewis, Yann Dauphin
We explore story generation: creative systems that can build coherent and fluent passages of text about a topic.
no code implementations • EMNLP 2017 • Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.
no code implementations • ICLR 2018 • Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou
In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.
no code implementations • 9 Jun 2017 • Serena Yeung, Anitha Kannan, Yann Dauphin, Li Fei-Fei
The so-called epitomes of this model are groups of mutually exclusive latent factors that compete to explain the data.
1 code implementation • ICML 2017 • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.
no code implementations • 5 Mar 2015 • Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies.
4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
no code implementations • NeurIPS 2013 • Yann Dauphin, Yoshua Bengio
Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised.
no code implementations • 18 Jul 2012 • Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai
It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.