Search Results for author: Yann Dauphin

Found 21 papers, 5 papers with code

No One Representation to Rule Them All: Overlapping Features of Training Methods

no code implementations20 Oct 2021 Raphael Gontijo-Lopes, Yann Dauphin, Ekin D. Cubuk

Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions.

Contrastive Learning

Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral

1 code implementation ICLR 2021 Lucio M. Dery, Yann Dauphin, David Grangier

In this case, careful consideration is needed to select tasks and model parameterizations such that updates from the auxiliary tasks actually help the primary task.

Image Classification

Deconstructing the Regularization of BatchNorm

no code implementations ICLR 2021 Yann Dauphin, Ekin Dogus Cubuk

Surprisingly, this simple mechanism matches the improvement of $0. 8\%$ of the more complex Dropout regularization for the state-of-the-art Efficientnet-B8 model on Imagenet.

Temperature check: theory and practice for training models with softmax-cross-entropy losses

no code implementations14 Oct 2020 Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz

In this work we develop a theory of early learning for models trained with softmax-cross-entropy loss and show that the learning dynamics depend crucially on the inverse-temperature $\beta$ as well as the magnitude of the logits at initialization, $||\beta{\bf z}||_{2}$.

Sentiment Analysis

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

no code implementations7 Oct 2020 Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity.

Robust and On-the-fly Dataset Denoising for Image Classification

no code implementations ECCV 2020 Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.

Classification Denoising +2

What Do Compressed Deep Neural Networks Forget?

2 code implementations13 Nov 2019 Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome

However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.

Fairness Interpretability Techniques for Deep Learning +4

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

no code implementations25 Sep 2019 Sara Hooker, Yann Dauphin, Aaron Courville, Andrea Frome

Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to top-1 test set accuracy.

Network Pruning

Better Generalization with On-the-fly Dataset Denoising

no code implementations ICLR 2019 Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin

Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.

Denoising

Strategies for Structuring Story Generation

no code implementations ACL 2019 Angela Fan, Mike Lewis, Yann Dauphin

Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right.

Story Generation

Hierarchical Neural Story Generation

6 code implementations ACL 2018 Angela Fan, Mike Lewis, Yann Dauphin

We explore story generation: creative systems that can build coherent and fluent passages of text about a topic.

Story Generation

Deal or No Deal? End-to-End Learning of Negotiation Dialogues

no code implementations EMNLP 2017 Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

no code implementations ICLR 2018 Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

Tackling Over-pruning in Variational Autoencoders

no code implementations9 Jun 2017 Serena Yeung, Anitha Kannan, Yann Dauphin, Li Fei-Fei

The so-called epitomes of this model are groups of mutually exclusive latent factors that compete to explain the data.

Parseval Networks: Improving Robustness to Adversarial Examples

no code implementations ICML 2017 Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier

We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

3 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs

no code implementations NeurIPS 2013 Yann Dauphin, Yoshua Bengio

Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised.

Text Classification

Better Mixing via Deep Representations

no code implementations18 Jul 2012 Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai

It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.

Cannot find the paper you are looking for? You can Submit a new open access paper.