Search Results for author: Yann N. Dauphin

Found 18 papers, 10 papers with code

Neglected Hessian component explains mysteries in Sharpness regularization

no code implementations • 19 Jan 2024 • Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

Paper
Add Code

Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

no code implementations • 5 Jun 2023 • Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.

Paper
Add Code

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

no code implementations • 17 Feb 2023 • Atish Agarwala, Yann N. Dauphin

We show that in a simplified setting, SAM dynamically induces a stabilization related to the edge of stability (EOS) phenomenon observed in large learning rate gradient descent.

Paper
Add Code

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

no code implementations • 22 Nov 2022 • Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

In a top-tier computer science conference (NeurIPS 2021) with more than 23, 000 submitting authors and 9, 000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews.

Paper
Add Code

MetaInit: Initializing learning by learning to initialize

no code implementations • NeurIPS 2019 • Yann N. Dauphin, Samuel Schoenholz

In particular, we find that this approach outperforms normalization for networks without skip connections on CIFAR-10 and can scale to Resnet-50 models on Imagenet.

Meta-Learning

Paper
Add Code

Simple and Effective Noisy Channel Modeling for Neural Machine Translation

1 code implementation • IJCNLP 2019 • Kyra Yee, Nathan Ng, Yann N. Dauphin, Michael Auli

Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence.

Machine Translation Sentence +1

29,233

Paper
Code

Pay Less Attention with Lightweight and Dynamic Convolutions

4 code implementations • ICLR 2019 • Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.

Ranked #1 on Machine Translation on WMT 2017 English-Chinese

Abstractive Text Summarization Language Modelling +2

29,233

Paper
Code

Fixup Initialization: Residual Learning Without Normalization

7 code implementations • ICLR 2019 • Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

Normalization layers are a staple in state-of-the-art deep neural network architectures.

Ranked #9 on Image Classification on SVHN

General Classification Image Classification +2

149

Paper
Code

mixup: Beyond Empirical Risk Minimization

71 code implementations • ICLR 2018 • Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Domain Generalization Memorization +2

29,735

Paper
Code

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

2 code implementations • 16 Jun 2017 • Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

1,375

Paper
Code

Convolutional Sequence to Sequence Learning

37 code implementations • ICML 2017 • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks.

Ranked #3 on Bangla Spelling Error Correction on DPCSpell-Bangla-SEC-Corpus

Bangla Spelling Error Correction Image Classification +2

29,237

Paper
Code

Language Modeling with Gated Convolutional Networks

10 code implementations • ICML 2017 • Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

The pre-dominant approach to language modeling to date is based on recurrent neural networks.

Ranked #18 on Language Modelling on One Billion Word

Language Modelling Sentence

29,237

Paper
Code

A Convolutional Encoder Model for Neural Machine Translation

2 code implementations • ACL 2017 • Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence.

Ranked #7 on Machine Translation on IWSLT2015 German-English

Machine Translation Sentence +1

29,237

Paper
Code

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

9,852

Paper
Code

Predicting distributions with Linearizing Belief Networks

no code implementations • 17 Nov 2015 • Yann N. Dauphin, David Grangier

Contrary to a classical neural network, a belief network can predict more than the expected value of the output $Y$ given the input $X$.

Facial expression generation Image Denoising +1

Paper
Add Code

Equilibrated adaptive learning rates for non-convex optimization

2 code implementations • NeurIPS 2015 • Yann N. Dauphin, Harm de Vries, Yoshua Bengio

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.

Paper
Code

On the saddle point problem for non-convex optimization

no code implementations • 19 May 2014 • Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Add Code

Zero-Shot Learning for Semantic Utterance Classification

no code implementations • 20 Dec 2013 • Yann N. Dauphin, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

We propose a novel zero-shot learning method for semantic utterance classification (SUC).

Classification General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.