Search Results for author: Yoshua Bengio

Found 576 papers, 295 papers with code

Learning the 2-D Topology of Images

no code implementations • NeurIPS 2007 • Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl

We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images?

Paper
Add Code

Topmoumoute Online Natural Gradient Algorithm

no code implementations • NeurIPS 2007 • Nicolas L. Roux, Pierre-Antoine Manzagol, Yoshua Bengio

Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error.

Paper
Add Code

Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

no code implementations • NeurIPS 2009 • Yoshua Bengio, James S. Bergstra

We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1.

Paper
Add Code

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

no code implementations • NeurIPS 2009 • Douglas Eck, Yoshua Bengio, Aaron C. Courville

The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an infinite number of latent factors.

TAG

Paper
Add Code

Understanding the difficulty of training deep feedforward neural networks

no code implementations • 13 May 2010 • Xavier Glorot, Yoshua Bengio

Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures.

Paper
Add Code

Adaptive Drift-Diffusion Process to Learn Time Intervals

1 code implementation • 11 Mar 2011 • Francois Rivest, Yoshua Bengio

We provide an analytical proof that the model can learn inter-event intervals in a number of trials independent of the interval size and that the temporal precision of the system is proportional to the timed interval.

Paper
Code

Algorithms for Hyper-Parameter Optimization

no code implementations • NeurIPS 2011 • James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl

Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs.

Image Classification

Paper
Add Code

On Tracking The Partition Function

no code implementations • NeurIPS 2011 • Guillaume Desjardins, Yoshua Bengio, Aaron C. Courville

In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to {\bf track} the log partition function during learning.

Paper
Add Code

Shallow vs. Deep Sum-Product Networks

no code implementations • NeurIPS 2011 • Olivier Delalleau, Yoshua Bengio

We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures.

Paper
Add Code

Representation Learning: A Review and New Perspectives

5 code implementations • 24 Jun 2012 • Yoshua Bengio, Aaron Courville, Pascal Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.

Density Estimation Representation Learning

106

Paper
Code

Practical recommendations for gradient-based training of deep architectures

14 code implementations • 24 Jun 2012 • Yoshua Bengio

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.

76,632

Paper
Code

Large-Scale Feature Learning With Spike-and-Slab Sparse Coding

no code implementations • 27 Jun 2012 • Ian Goodfellow, Aaron Courville, Yoshua Bengio

We consider the problem of object recognition with a large number of classes.

Object Recognition Transfer Learning

Paper
Add Code

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

no code implementations • 27 Jun 2012 • Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation.

Ranked #5 on Music Modeling on JSB Chorales

Language Modelling Music Generation +1

Paper
Add Code

Deep Learning for NLP (without Magic)

no code implementations • ACL 2012 • Richard Socher, Yoshua Bengio, Christopher D. Manning

Feature Engineering Language Modelling +2

Paper
Add Code

Better Mixing via Deep Representations

no code implementations • 18 Jul 2012 • Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai

It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.

Paper
Add Code

Efficient EM Training of Gaussian Mixtures with Missing Data

1 code implementation • 4 Sep 2012 • Olivier Delalleau, Aaron Courville, Yoshua Bengio

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms.

Paper
Code

Disentangling Factors of Variation via Generative Entangling

no code implementations • 19 Oct 2012 • Guillaume Desjardins, Aaron Courville, Yoshua Bengio

Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation.

General Classification

Paper
Add Code

What Regularized Auto-Encoders Learn from the Data Generating Distribution

no code implementations • 18 Nov 2012 • Guillaume Alain, Yoshua Bengio

This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density.

Denoising

Paper
Add Code

On the difficulty of training Recurrent Neural Networks

no code implementations • 21 Nov 2012 • Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994).

Paper
Add Code

Theano: new features and speed improvements

no code implementations • 23 Nov 2012 • Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio

Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations.

BIG-bench Machine Learning

Paper
Add Code

Advances in Optimizing Recurrent Networks

no code implementations • 4 Dec 2012 • Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu

After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks.

Paper
Add Code

A Semantic Matching Energy Function for Learning with Multi-relational Data

no code implementations • 15 Jan 2013 • Xavier Glorot, Antoine Bordes, Jason Weston, Yoshua Bengio

Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing.

Information Retrieval Link Prediction +2

Paper
Add Code

Joint Training Deep Boltzmann Machines for Classification

no code implementations • 16 Jan 2013 • Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

We introduce a new method for training deep Boltzmann machines jointly.

Classification General Classification

Paper
Add Code

Revisiting Natural Gradient for Deep Networks

no code implementations • 16 Jan 2013 • Razvan Pascanu, Yoshua Bengio

We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models.

Paper
Add Code

Knowledge Matters: Importance of Prior Information for Optimization

1 code implementation • 17 Jan 2013 • Çağlar Gülçehre, Yoshua Bengio

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn.

Unsupervised Pre-training

Paper
Code

Maxout Networks

7 code implementations • 18 Feb 2013 • Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.

Ranked #34 on Image Classification on MNIST

General Classification Image Classification

586

Paper
Code

Deep Learning of Representations: Looking Forward

no code implementations • 2 May 2013 • Yoshua Bengio

Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts.

Paper
Add Code

Estimating or Propagating Gradients Through Stochastic Neurons

no code implementations • 14 May 2013 • Yoshua Bengio

The second approach we propose assumes that an estimator of the gradient can be back-propagated and it provides an unbiased estimator of the gradient, but can only work with non-linearities unlike the hard threshold, but like the rectifier, that are not flat for all of their range.

Paper
Add Code

Generalized Denoising Auto-Encoders as Generative Models

1 code implementation • NeurIPS 2013 • Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent

Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued.

Denoising valid

Paper
Code

Deep Generative Stochastic Networks Trainable by Backprop

3 code implementations • 5 Jun 2013 • Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

318

Paper
Code

Challenges in Representation Learning: A report on three machine learning contests

11 code implementations • 1 Jul 2013 • Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko, Bing Xu, Zhang Chuang, Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge.

Ranked #12 on Facial Expression Recognition (FER) on FER2013

BIG-bench Machine Learning Facial Expression Recognition +2

420

Paper
Code

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

1 code implementation • 15 Aug 2013 • Yoshua Bengio, Nicholas Léonard, Aaron Courville

Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons?

Paper
Code

Pylearn2: a machine learning research library

6 code implementations • 20 Aug 2013 • Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, Yoshua Bengio

Pylearn2 is a machine learning research library.

BIG-bench Machine Learning Philosophy

2,755

Paper
Code

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

no code implementations • 7 Nov 2013 • Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio

In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.

Object Recognition

Paper
Add Code

Bounding the Test Log-Likelihood of Generative Models

no code implementations • 24 Nov 2013 • Yoshua Bengio, Li Yao, Kyunghyun Cho

Several interesting generative learning algorithms involve a complex probability distribution over many random variables, involving intractable normalization constants or latent variable normalization.

Paper
Add Code

Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs

no code implementations • NeurIPS 2013 • Yann Dauphin, Yoshua Bengio

Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised.

text-classification Text Classification +1

Paper
Add Code

Multi-Prediction Deep Boltzmann Machines

no code implementations • NeurIPS 2013 • Ian Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio

We introduce the Multi-Prediction Deep Boltzmann Machine (MP-DBM).

Classification General Classification

Paper
Add Code

On the Challenges of Physical Implementations of RBMs

no code implementations • 18 Dec 2013 • Vincent Dumoulin, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC.

Paper
Add Code

Multimodal Transitions for Generative Stochastic Networks

no code implementations • 19 Dec 2013 • Sherjil Ozair, Li Yao, Yoshua Bengio

Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution.

Paper
Add Code

On the number of response regions of deep feed forward networks with piece-wise linear activations

no code implementations • 20 Dec 2013 • Razvan Pascanu, Guido Montufar, Yoshua Bengio

For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$.

Paper
Add Code

How to Construct Deep Recurrent Neural Networks

no code implementations • 20 Dec 2013 • Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).

Language Modelling

Paper
Add Code

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

1 code implementation • 21 Dec 2013 • Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio

Catastrophic forgetting is a problem faced by many machine learning models and algorithms.

BIG-bench Machine Learning

Paper
Code

An empirical analysis of dropout in piecewise linear networks

no code implementations • 21 Dec 2013 • David Warde-Farley, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters.

Paper
Add Code

On the Number of Linear Regions of Deep Neural Networks

no code implementations • NeurIPS 2014 • Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have.

Paper
Add Code

On the saddle point problem for non-convex optimization

no code implementations • 19 May 2014 • Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Add Code

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

43 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).

Ranked #47 on Machine Translation on WMT2014 English-French

Decoder Machine Translation +1

13,730

Paper
Code

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

1 code implementation • 5 Jun 2014 • Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Ranked #7 on Image Generation on Binarized MNIST

Density Estimation Image Generation +1

Paper
Code

Generative Adversarial Networks

183 code implementations • Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels

48,648

Paper
Code

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Code

Reweighted Wake-Sleep

2 code implementations • 11 Jun 2014 • Jörg Bornschein, Yoshua Bengio

The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.

Paper
Code

Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

no code implementations • 28 Jun 2014 • Kyunghyun Cho, Yoshua Bengio

Conditional computation has been proposed as a way to increase the capacity of a deep neural network without increasing the amount of computation required, by activating some parameters and computation "on-demand", on a per-example basis.

Paper
Add Code

How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

no code implementations • 29 Jul 2014 • Yoshua Bengio

We propose to exploit {\em reconstruction} as a layer-local training signal for deep learning.

Paper
Add Code

Neural Machine Translation by Jointly Learning to Align and Translate

121 code implementations • 1 Sep 2014 • Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Neural machine translation is a recently proposed approach to machine translation.

Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)

Bangla Spelling Error Correction Decoder +4

13,730

Paper
Code

On the Equivalence Between Deep NADE and Generative Stochastic Networks

no code implementations • 2 Sep 2014 • Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio

Orderless NADEs are trained based on a criterion that stochastically maximizes $P(\mathbf{x})$ with all possible orders of factorizations.

Paper
Add Code

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

no code implementations • WS 2014 • Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio

The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems.

Machine Translation Sentence +1

Paper
Add Code

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

2 code implementations • 3 Sep 2014 • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.

Decoder Machine Translation +2

150

Paper
Code

Deep Tempering

no code implementations • 1 Oct 2014 • Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio

Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning.

Paper
Add Code

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

no code implementations • WS 2014 • Kyunghyun Cho, Bart van Merri{\"e}nboer, Dzmitry Bahdanau, Yoshua Bengio

Decoder Machine Translation +1

Paper
Add Code

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

no code implementations • EMNLP 2014 • Kyunghyun Cho, Bart van Merri{\"e}nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Decoder Language Modelling +3

Paper
Add Code

Deep Directed Generative Autoencoders

no code implementations • 2 Oct 2014 • Sherjil Ozair, Yoshua Bengio

The objective is to learn an encoder $f(\cdot)$ that maps $X$ to $f(X)$ that has a much simpler distribution than $X$ itself, estimated by $P(H)$.

Decoder

Paper
Add Code

Not All Neural Embeddings are Born Equal

no code implementations • 2 Oct 2014 • Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio

Neural language models learn word representations that capture rich linguistic and conceptual information.

Machine Translation Translation

Paper
Add Code

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

2 code implementations • 9 Oct 2014 • Stephan Gouws, Yoshua Bengio, Greg Corrado

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Ranked #1 on Document Classification on Reuters En-De

Cross-Lingual Document Classification Document Classification +3

116

Paper
Code

NICE: Non-linear Independent Components Estimation

19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

612

Paper
Code

How transferable are features in deep neural networks?

3 code implementations • NeurIPS 2014 • Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson

Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks.

Specificity

Paper
Code

Iterative Neural Autoregressive Distribution Estimator NADE-k

1 code implementation • NeurIPS 2014 • Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Ranked #8 on Image Generation on Binarized MNIST

Density Estimation Image Generation +1

Paper
Code

Generative Adversarial Nets

1 code implementation • NeurIPS 2014 • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

334

Paper
Code

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

no code implementations • 4 Dec 2014 • Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes.

Decoder speech-recognition +1

Paper
Add Code

On Using Very Large Target Vocabulary for Neural Machine Translation

1 code implementation • IJCNLP 2015 • Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.

Machine Translation Translation

1,456

Paper
Code

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

13 code implementations • 11 Dec 2014 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).

Ranked #10 on Music Modeling on JSB Chorales

Music Modeling

368

Paper
Code

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

246

Paper
Code

FitNets: Hints for Thin Deep Nets

3 code implementations • 19 Dec 2014 • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Knowledge Distillation

1,281

Paper
Code

Embedding Word Similarity with Neural Machine Translation

no code implementations • 19 Dec 2014 • Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio

Here we investigate the embeddings learned by neural machine translation models, a recently-developed class of neural language model.

Language Modelling Machine Translation +2

Paper
Add Code

Training deep neural networks with low precision multiplications

1 code implementation • 22 Dec 2014 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.

Paper
Code

Difference Target Propagation

1 code implementation • 23 Dec 2014 • Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.

Paper
Code

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

no code implementations • 23 Dec 2014 • Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.

Paper
Add Code

Gated Feedback Recurrent Neural Networks

no code implementations • 9 Feb 2015 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

In this work, we propose a novel recurrent neural network (RNN) architecture.

Language Modelling

Paper
Add Code

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

88 code implementations • 10 Feb 2015 • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

2,662

Paper
Code

Towards Biologically Plausible Deep Learning

no code implementations • 14 Feb 2015 • Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin

Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology.

Denoising Representation Learning

Paper
Add Code

Equilibrated adaptive learning rates for non-convex optimization

2 code implementations • NeurIPS 2015 • Yann N. Dauphin, Harm de Vries, Yoshua Bengio

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.

Paper
Code

EmoNets: Multimodal deep learning approaches for emotion recognition in video

no code implementations • 5 Mar 2015 • Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio

The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies.

Emotion Recognition Multimodal Deep Learning

Paper
Add Code

On Using Monolingual Corpora in Neural Machine Translation

no code implementations • 11 Mar 2015 • Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

Machine Translation Translation

Paper
Add Code

GSNs : Generative Stochastic Networks

no code implementations • 18 Mar 2015 • Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng Zhang, Pascal Vincent

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

Denoising

Paper
Add Code

Learning to Understand Phrases by Embedding the Dictionary

2 code implementations • TACL 2016 • Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio

Distributional models that learn rich semantic word representations are a success story of recent NLP research.

General Knowledge

107

Paper
Code

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks

4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio

In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.

Ranked #34 on Image Classification on MNIST

Image Classification Object Recognition

124

Paper
Code

Brain Tumor Segmentation with Deep Neural Networks

15 code implementations • 13 May 2015 • Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, Hugo Larochelle

Finally, we explore a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN.

Ranked #1 on Brain Tumor Segmentation on BRATS-2013 leaderboard

Brain Tumor Segmentation Tumor Segmentation

299

Paper
Code

Blocks and Fuel: Frameworks for deep learning

5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.

BIG-bench Machine Learning

1,159

Paper
Code

A Recurrent Latent Variable Model for Sequential Data

5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

288

Paper
Code

Bidirectional Helmholtz Machines

1 code implementation • 12 Jun 2015 • Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio

We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.

Paper
Code

Attention-Based Models for Speech Recognition

14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Ranked #17 on Speech Recognition on TIMIT

Machine Translation Speech Recognition +1

1,160

Paper
Code

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

no code implementations • 4 Jul 2015 • Kyunghyun Cho, Aaron Courville, Yoshua Bengio

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input.

Caption Generation Decoder +4

Paper
Add Code

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

4 code implementations • 8 Jul 2015 • Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

Decoder

175

Paper
Code

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

7 code implementations • 17 Jul 2015 • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.

Decoder Word Embeddings

308

Paper
Code

Clustering is Efficient for Approximate Maximum Inner Product Search

no code implementations • 21 Jul 2015 • Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua Bengio

Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes.

Clustering Recommendation Systems +2

Paper
Add Code

Artificial Neural Networks Applied to Taxi Destination Prediction

1 code implementation • 31 Jul 2015 • Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio

We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.

260

Paper
Code

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

260

Paper
Code

Montreal Neural Machine Translation Systems for WMT'15

no code implementations • WS 2015 • S{\'e}bastien Jean, Orhan Firat, Kyunghyun Cho, Rol Memisevic, , Yoshua Bengio

Machine Translation Translation

Paper
Add Code

STDP as presynaptic activity times rate of change of postsynaptic activity

no code implementations • 19 Sep 2015 • Yoshua Bengio, Thomas Mesnard, Asja Fischer, Saizheng Zhang, Yuhuai Wu

We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed.

Paper
Add Code

Batch Normalized Recurrent Neural Networks

no code implementations • 5 Oct 2015 • César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio

Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies.

Language Modelling speech-recognition +1

Paper
Add Code

Early Inference in Energy-Based Models Approximates Back-Propagation

no code implementations • 9 Oct 2015 • Yoshua Bengio, Asja Fischer

We show that Langevin MCMC inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similarly to back-propagation.

Paper
Add Code

Neural Networks with Few Multiplications

2 code implementations • 11 Oct 2015 • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio

For most deep learning algorithms training is notoriously time consuming.

General Classification

374

Paper
Code

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

5 code implementations • NeurIPS 2015 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.

Ranked #30 on Image Classification on SVHN

6,298

Paper
Code

Oracle performance for visual captioning

1 code implementation • 14 Nov 2015 • Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

The task of associating images and videos with a natural language description has attracted a great amount of attention recently.

Image Captioning Language Modelling +1

260

Paper
Code

Deconstructing the Ladder Network Architecture

no code implementations • 19 Nov 2015 • Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

Although the empirical results are impressive, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture.

Denoising

Paper
Add Code

Denoising Criterion for Variational Auto-Encoding Framework

no code implementations • 19 Nov 2015 • Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio

Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer that encourages this noise injection.

Denoising

Paper
Add Code

Task Loss Estimation for Sequence Prediction

1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Decoder Language Modelling +2

260

Paper
Code

Unitary Evolution Recurrent Neural Networks

2 code implementations • 20 Nov 2015 • Martin Arjovsky, Amar Shah, Yoshua Bengio

When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.

Ranked #26 on Sequential Image Classification on Sequential MNIST

Sequential Image Classification

Paper
Code

Variance Reduction in SGD by Distributed Importance Sampling

1 code implementation • 20 Nov 2015 • Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio

This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.

316

Paper
Code

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville

Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.

Ranked #18 on Semantic Segmentation on CamVid

Segmentation Semantic Segmentation +1

124

Paper
Code

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism

no code implementations • NAACL 2016 • Orhan Firat, Kyunghyun Cho, Yoshua Bengio

We propose multi-way, multilingual neural machine translation.

Machine Translation Translation

Paper
Add Code

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

26 code implementations • 9 Feb 2016 • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.

6,298

Paper
Code

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation

2 code implementations • 16 Feb 2016 • Benjamin Scellier, Yoshua Bengio

Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.

108

Paper
Code

Architectural Complexity Measures of Recurrent Neural Networks

no code implementations • NeurIPS 2016 • Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio

In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs).

Ranked #23 on Language Modelling on Text8

Language Modelling

Paper
Add Code

Noisy Activation Functions

1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

479

Paper
Code

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

2 code implementations • ACL 2016 • Junyoung Chung, Kyunghyun Cho, Yoshua Bengio

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.

Ranked #3 on Machine Translation on WMT2015 English-German

Decoder Machine Translation +2

168

Paper
Code

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

1 code implementation • ACL 2016 • Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances.

Machine Translation Question Generation +4

Paper
Code

Pointing the Unknown Words

no code implementations • ACL 2016 • Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio

At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.

Machine Translation Sentence +2

Paper
Add Code

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

9,853

Paper
Code

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

Decoder Response Generation

308

Paper
Code

Hierarchical Memory Networks

no code implementations • 24 May 2016 • Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio

In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks.

Hard Attention Question Answering

Paper
Add Code

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

4 code implementations • 2 Jun 2016 • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.

Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)

Dialogue Generation Response Generation

308

Paper
Code

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

We propose zoneout, a novel method for regularizing RNNs.

Language Modelling

311

Paper
Code

Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible

no code implementations • 6 Jun 2016 • Yoshua Bengio, Benjamin Scellier, Olexa Bilaniuk, Joao Sacramento, Walter Senn

We find conditions under which a simple feedforward computation is a very good initialization for inference, after the input units are clamped to observed values.

Paper
Add Code

Iterative Alternating Neural Attention for Machine Reading

1 code implementation • 7 Jun 2016 • Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.

Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)

Question Answering Reading Comprehension

438

Paper
Code

Deep Directed Generative Models with Energy-Based Probability Estimation

no code implementations • 10 Jun 2016 • Taesup Kim, Yoshua Bengio

Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training.

Paper
Add Code

Online and Offline Handwritten Chinese Character Recognition: A Comprehensive Study and New Benchmark

no code implementations • 18 Jun 2016 • Xu-Yao Zhang, Yoshua Bengio, Cheng-Lin Liu

Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective.

Data Augmentation Offline Handwritten Chinese Character Recognition

Paper
Add Code

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

1 code implementation • 21 Jun 2016 • Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, Yoshua Bengio

In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters.

Handwriting Recognition

Paper
Code

On Multiplicative Integration with Recurrent Neural Networks

no code implementations • NeurIPS 2016 • Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs).

Language Modelling

Paper
Add Code

Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

no code implementations • 30 Jun 2016 • Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio

We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.

Ranked #5 on Question Answering on bAbi

Natural Language Inference Question Answering

Paper
Add Code

Context-Dependent Word Representation for Neural Machine Translation

1 code implementation • 3 Jul 2016 • Heeyoul Choi, Kyunghyun Cho, Yoshua Bengio

Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence.

Decoder Machine Translation +2

Paper
Code

HeMIS: Hetero-Modal Image Segmentation

1 code implementation • 18 Jul 2016 • Mohammad Havaei, Nicolas Guizard, Nicolas Chapados, Yoshua Bengio

We introduce a deep learning image segmentation framework that is extremely robust to missing imaging modalities.

Ranked #98 on Semantic Segmentation on NYU Depth v2

Image Segmentation Imputation +2

Paper
Code

An Actor-Critic Algorithm for Sequence Prediction

3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).

Ranked #8 on Machine Translation on IWSLT2015 English-German

Caption Generation Machine Translation +3

658

Paper
Code

A Neural Knowledge Language Model

no code implementations • 1 Aug 2016 • Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, Yoshua Bengio

Current language models have a significant limitation in the ability to encode and decode factual knowledge.

Language Modelling

Paper
Add Code

NYU-MILA Neural Machine Translation Systems for WMT'16

no code implementations • WS 2016 • Junyoung Chung, Kyunghyun Cho, Yoshua Bengio

Machine Translation Translation

Paper
Add Code

Mollifying Networks

no code implementations • 17 Aug 2016 • Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.

Paper
Add Code

Recurrent Neural Networks With Limited Numerical Precision

1 code implementation • 24 Aug 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio

We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets.

Binarization

Paper
Code

Hierarchical Multiscale Recurrent Neural Networks

3 code implementations • 6 Sep 2016 • Junyoung Chung, Sungjin Ahn, Yoshua Bengio

Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.

Ranked #19 on Language Modelling on Text8

Language Modelling

136

Paper
Code

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

5 code implementations • 22 Sep 2016 • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.

1,980

Paper
Code

Understanding intermediate layers using linear classifier probes

1 code implementation • 5 Oct 2016 • Guillaume Alain, Yoshua Bengio

Neural network models have a reputation for being black boxes.

General Classification

Paper
Code

Professor Forcing: A New Algorithm for Training Recurrent Networks

1 code implementation • NeurIPS 2016 • Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio

We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps.

Domain Adaptation Handwriting generation +2

Paper
Code

Recurrent Neural Networks With Limited Numerical Precision

1 code implementation • 21 Nov 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio

Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high.

Quantization

Paper
Code

Invariant Representations for Noisy Speech Recognition

no code implementations • 27 Nov 2016 • Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

22 code implementations • 28 Nov 2016 • Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs).

Ranked #9 on Semantic Segmentation on CamVid

Image Segmentation Segmentation +1

488

Paper
Code

Diet Networks: Thin Parameters for Fat Genomics

5 code implementations • 28 Nov 2016 • Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio

It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).

Parameter Prediction

Paper
Code

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

1 code implementation • CVPR 2017 • Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.

Image Captioning Image Inpainting

539

Paper
Code

Mode Regularized Generative Adversarial Networks

no code implementations • 7 Dec 2016 • Tong Che, Yan-ran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li

Although Generative Adversarial Networks achieve state-of-the-art results on a variety of generative tasks, they are regarded as highly unstable and prone to miss modes.

Paper
Add Code

Generalizable Features From Unsupervised Learning

no code implementations • 12 Dec 2016 • Mehdi Mirza, Aaron Courville, Yoshua Bengio

In this work, we explore the potential of unsupervised learning to find features that promote better generalization to settings outside the supervised training distribution.

Physical Intuition

Paper
Add Code

On Random Weights for Texture Generation in One Layer Neural Networks

no code implementations • 19 Dec 2016 • Mihir Mongia, Kundan Kumar, Akram Erraqabi, Yoshua Bengio

Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures.

Texture Synthesis

Paper
Add Code

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

4 code implementations • 22 Dec 2016 • Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Audio Generation Temporal Sequences

533

Paper
Code

Memory Augmented Neural Networks with Wormhole Connections

no code implementations • 30 Jan 2017 • Caglar Gulcehre, Sarath Chandar, Yoshua Bengio

We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.

Paper
Add Code

Learning Normalized Inputs for Iterative Estimation in Medical Image Segmentation

no code implementations • 16 Feb 2017 • Michal Drozdzal, Gabriel Chartrand, Eugene Vorontsov, Lisa Di Jorio, An Tang, Adriana Romero, Yoshua Bengio, Chris Pal, Samuel Kadoury

Moreover, when applying our 2D pipeline on a challenging 3D MRI prostate segmentation challenge we reach results that are competitive even when compared to 3D methods.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

no code implementations • 26 Feb 2017 • Tong Che, Yan-ran Li, Ruixiang Zhang, R. Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio

Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted.

Paper
Add Code

Boundary-Seeking Generative Adversarial Networks

6 code implementations • 27 Feb 2017 • R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.

Scene Understanding Text Generation

15,753

Paper
Code

A Robust Adaptive Stochastic Gradient Method for Deep Learning

1 code implementation • 2 Mar 2017 • Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.

Paper
Code

A Structured Self-attentive Sentence Embedding

52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

General Classification Natural Language Inference +5

8,482

Paper
Code

Sharp Minima Can Generalize For Deep Nets

no code implementations • ICML 2017 • Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice.

Paper
Add Code

Independently Controllable Features

no code implementations • 22 Mar 2017 • Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio

Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images.

Paper
Add Code

A network of deep neural networks for distant speech recognition

no code implementations • 23 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

Distant Speech Recognition Speech Enhancement +1

Paper
Add Code

Batch-normalized joint training for DNN-based distant speech recognition

no code implementations • 24 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces.

Distant Speech Recognition Speech Enhancement +1

Paper
Add Code

Count-ception: Counting by Fully Convolutional Redundant Counting

2 code implementations • 25 Mar 2017 • Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio

Our contribution is redundant counting instead of predicting a density map in order to average over errors.

Object Localization regression

Paper
Code

Image Segmentation by Iterative Inference from Conditional Score Estimation

1 code implementation • ICLR 2018 • Adriana Romero, Michal Drozdzal, Akram Erraqabi, Simon Jégou, Yoshua Bengio

We experimentally find that the proposed iterative inference from conditional score estimation by conditional denoising autoencoders performs better than comparable models based on CRFs or those not using any explicit modeling of the conditional joint distribution of outputs.

Denoising Image Segmentation +1

Paper
Code

Deep Complex Networks

9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Ranked #3 on Music Transcription on MusicNet

Image Classification Music Transcription +1

703

Paper
Code

Deep Learning for Patient-Specific Kidney Graft Survival Analysis

2 code implementations • 29 May 2017 • Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, Yoshua Bengio

An accurate model of patient-specific kidney graft survival distributions can help to improve shared-decision making in the treatment and care of patients.

Decision Making Multi-Task Learning +1

Paper
Code

Learning to Compute Word Embeddings On the Fly

no code implementations • ICLR 2018 • Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare.

Ranked #48 on Question Answering on SQuAD1.1 dev

Language Modelling Natural Language Inference +3

Paper
Add Code

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation • 8 Jun 2017 • Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

Paper
Code

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation • 13 Jun 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Decoder Machine Translation +1

168

Paper
Code

Neural Models for Key Phrase Detection and Question Generation

no code implementations • 14 Jun 2017 • Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler

We propose a two-stage neural model to tackle question generation from documents.

Question Answering Question Generation +2

Paper
Add Code

A Closer Look at Memorization in Deep Networks

2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.

Adversarial Robustness Memorization

Paper
Code

Variance Regularizing Adversarial Learning

no code implementations • ICLR 2018 • Karan Grewal, R. Devon Hjelm, Yoshua Bengio

We hypothesize that this approach ensures a non-zero gradient to the generator, even in the limit of a perfect classifier.

Paper
Add Code

Multiscale sequence modeling with a learned dictionary

no code implementations • 3 Jul 2017 • Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio

We propose a generalization of neural network sequence models.

Language Modelling

Paper
Add Code

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

no code implementations • 19 Jul 2017 • Taesup Kim, Inchul Song, Yoshua Bengio

Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the training speed and stability.

speech-recognition Speech Recognition

Paper
Add Code

Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning

no code implementations • WS 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention.

Decoder Hierarchical Reinforcement Learning +4

Paper
Add Code

Independently Controllable Factors

no code implementations • 3 Aug 2017 • Valentin Thomas, Jules Pondard, Emmanuel Bengio, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation.

Open-Ended Question Answering

Paper
Add Code

Twin Networks: Matching the Future for Sequence Generation

2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio

We propose a simple technique for encouraging generative RNNs to plan ahead.

Caption Generation speech-recognition +1

2,354

Paper
Code

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

1 code implementation • ACL 2017 • Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.

Dialogue Evaluation

Paper
Code

A Deep Reinforcement Learning Chatbot

no code implementations • 7 Sep 2017 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble.

Chatbot reinforcement-learning +3

Paper
Add Code

The Consciousness Prior

1 code implementation • 25 Sep 2017 • Yoshua Bengio

To the extent that these assumptions are generally true (and the form of natural language seems consistent with them), they can form a useful prior for representation learning.

Decision Making Representation Learning +1

Paper
Code

Improving speech recognition by revising gated recurrent units

1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.

speech-recognition Speech Recognition

Paper
Code

Residual Connections Encourage Iterative Inference

no code implementations • ICLR 2018 • Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.

Representation Learning

Paper
Add Code

Learning Independent Features with Adversarial Nets for Non-linear ICA

1 code implementation • ICLR 2018 • Philemon Brakel, Yoshua Bengio

We propose to learn independent features with adversarial objectives which optimize such measures implicitly.

Paper
Code

Generalization in Deep Learning

no code implementations • 16 Oct 2017 • Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature.

Open-Ended Question Answering

Paper
Add Code

FigureQA: An Annotated Figure Dataset for Visual Reasoning

1 code implementation • ICLR 2018 • Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio

To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure.

Ranked #3 on Visual Question Answering (VQA) on FigureQA - test 1

BIG-bench Machine Learning Chart Question Answering +2

Paper
Code

Graph Attention Networks

90 code implementations • ICLR 2018 • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

Ranked #1 on Node Classification on Pubmed (Validation metric)

Document Classification Graph Attention +8

48,648

Paper
Code

Fraternal Dropout

1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Ranked #28 on Language Modelling on Penn Treebank (Word Level)

Image Captioning Language Modelling

Paper
Code

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

no code implementations • 4 Nov 2017 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João F. Santos, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Singing voice separation based on deep learning relies on the usage of time-frequency masking.

Sound Audio and Speech Processing

Paper
Add Code

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

no code implementations • ICLR 2018 • Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Laurent Charlin, Chris Pal, Yoshua Bengio

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation.

Paper
Add Code

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Paper
Code

Three Factors Influencing Minima in SGD

no code implementations • ICLR 2018 • Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.

Memorization Open-Ended Question Answering

Paper
Add Code

ACtuAL: Actor-Critic Under Adversarial Learning

no code implementations • 13 Nov 2017 • Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R. Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio

This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function.

Language Modelling

Paper
Add Code

Variational Bi-LSTMs

no code implementations • ICLR 2018 • Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.

Paper
Add Code

Z-Forcing: Training Stochastic Recurrent Networks

1 code implementation • NeurIPS 2017 • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio

Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech.

Language Modelling Variational Inference

Paper
Code

Equivalence of Equilibrium Propagation and Recurrent Backpropagation

1 code implementation • 22 Nov 2017 • Benjamin Scellier, Yoshua Bengio

Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.

108

Paper
Code

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

1 code implementation • NeurIPS 2017 • Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into sequence-to-sequence models using attention.

Question Generation Question-Generation +2

168

Paper
Code

Measuring the tendency of CNNs to Learn Surface Statistical Regularities

1 code implementation • 30 Nov 2017 • Jason Jo, Yoshua Bengio

The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset.

Paper
Code

ObamaNet: Photo-realistic lip-sync from text

1 code implementation • 6 Dec 2017 • Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson, Yoshua Bengio

We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text.

Constrained Lip-synchronization

Paper
Code

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

no code implementations • NeurIPS 2017 • Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Directed latent variable models that formulate the joint distribution as $p(x, z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling.

Attribute

Paper
Add Code

Dendritic error backpropagation in deep cortical microcircuits

1 code implementation • 30 Dec 2017 • João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command.

Denoising

Paper
Code

Extending the Framework of Equilibrium Propagation to General Dynamics

no code implementations • ICLR 2018 • Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio

The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists.

Paper
Add Code

Boundary Seeking GANs

no code implementations • ICLR 2018 • R. Devon Hjelm, Athul Paul Jacob, Adam Trischler, Gerry Che, Kyunghyun Cho, Yoshua Bengio

Scene Understanding Text Generation

Paper
Add Code

Learning Generative Models with Locally Disentangled Latent Factors

no code implementations • ICLR 2018 • Brady Neal, Alex Lamb, Sherjil Ozair, Devon Hjelm, Aaron Courville, Yoshua Bengio, Ioannis Mitliagkas

One of the most successful techniques in generative models has been decomposing a complicated generation task into a series of simpler generation tasks.

Paper
Add Code

Combining Model-based and Model-free RL via Multi-step Control Variates

no code implementations • ICLR 2018 • Tong Che, Yuchen Lu, George Tucker, Surya Bhupatiraju, Shane Gu, Sergey Levine, Yoshua Bengio

Model-free deep reinforcement learning algorithms are able to successfully solve a wide range of continuous control tasks, but typically require many on-policy samples to achieve good performance.

Continuous Control OpenAI Gym

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.