no code implementations • NeurIPS 2007 • Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl
We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images?
no code implementations • NeurIPS 2007 • Nicolas L. Roux, Pierre-Antoine Manzagol, Yoshua Bengio
Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error.
no code implementations • NeurIPS 2009 • Yoshua Bengio, James S. Bergstra
We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1.
no code implementations • NeurIPS 2009 • Douglas Eck, Yoshua Bengio, Aaron C. Courville
The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an infinite number of latent factors.
no code implementations • 13 May 2010 • Xavier Glorot, Yoshua Bengio
Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures.
1 code implementation • 11 Mar 2011 • Francois Rivest, Yoshua Bengio
We provide an analytical proof that the model can learn inter-event intervals in a number of trials independent of the interval size and that the temporal precision of the system is proportional to the timed interval.
no code implementations • NeurIPS 2011 • James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl
Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs.
no code implementations • NeurIPS 2011 • Guillaume Desjardins, Yoshua Bengio, Aaron C. Courville
In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to {\bf track} the log partition function during learning.
no code implementations • NeurIPS 2011 • Olivier Delalleau, Yoshua Bengio
We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures.
5 code implementations • 24 Jun 2012 • Yoshua Bengio, Aaron Courville, Pascal Vincent
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.
14 code implementations • 24 Jun 2012 • Yoshua Bengio
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.
no code implementations • 27 Jun 2012 • Ian Goodfellow, Aaron Courville, Yoshua Bengio
We consider the problem of object recognition with a large number of classes.
no code implementations • 27 Jun 2012 • Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent
We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation.
Ranked #5 on Music Modeling on JSB Chorales
no code implementations • 18 Jul 2012 • Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai
It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.
1 code implementation • 4 Sep 2012 • Olivier Delalleau, Aaron Courville, Yoshua Bengio
In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms.
no code implementations • 19 Oct 2012 • Guillaume Desjardins, Aaron Courville, Yoshua Bengio
Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation.
no code implementations • 18 Nov 2012 • Guillaume Alain, Yoshua Bengio
This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density.
no code implementations • 21 Nov 2012 • Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994).
no code implementations • 23 Nov 2012 • Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio
Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations.
no code implementations • 4 Dec 2012 • Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu
After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks.
no code implementations • 15 Jan 2013 • Xavier Glorot, Antoine Bordes, Jason Weston, Yoshua Bengio
Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing.
no code implementations • 16 Jan 2013 • Ian J. Goodfellow, Aaron Courville, Yoshua Bengio
We introduce a new method for training deep Boltzmann machines jointly.
no code implementations • 16 Jan 2013 • Razvan Pascanu, Yoshua Bengio
We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models.
1 code implementation • 17 Jan 2013 • Çağlar Gülçehre, Yoshua Bengio
We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn.
7 code implementations • 18 Feb 2013 • Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.
Ranked #34 on Image Classification on MNIST
no code implementations • 2 May 2013 • Yoshua Bengio
Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts.
no code implementations • 14 May 2013 • Yoshua Bengio
The second approach we propose assumes that an estimator of the gradient can be back-propagated and it provides an unbiased estimator of the gradient, but can only work with non-linearities unlike the hard threshold, but like the rectifier, that are not flat for all of their range.
1 code implementation • NeurIPS 2013 • Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued.
3 code implementations • 5 Jun 2013 • Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.
11 code implementations • 1 Jul 2013 • Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko, Bing Xu, Zhang Chuang, Yoshua Bengio
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge.
Ranked #12 on Facial Expression Recognition (FER) on FER2013
1 code implementation • 15 Aug 2013 • Yoshua Bengio, Nicholas Léonard, Aaron Courville
Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons?
6 code implementations • 20 Aug 2013 • Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, Yoshua Bengio
Pylearn2 is a machine learning research library.
no code implementations • 7 Nov 2013 • Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio
In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.
no code implementations • 24 Nov 2013 • Yoshua Bengio, Li Yao, Kyunghyun Cho
Several interesting generative learning algorithms involve a complex probability distribution over many random variables, involving intractable normalization constants or latent variable normalization.
no code implementations • NeurIPS 2013 • Yann Dauphin, Yoshua Bengio
Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised.
no code implementations • NeurIPS 2013 • Ian Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio
We introduce the Multi-Prediction Deep Boltzmann Machine (MP-DBM).
no code implementations • 18 Dec 2013 • Vincent Dumoulin, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC.
no code implementations • 19 Dec 2013 • Sherjil Ozair, Li Yao, Yoshua Bengio
Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution.
no code implementations • 20 Dec 2013 • Razvan Pascanu, Guido Montufar, Yoshua Bengio
For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$.
no code implementations • 20 Dec 2013 • Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).
1 code implementation • 21 Dec 2013 • Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio
Catastrophic forgetting is a problem faced by many machine learning models and algorithms.
no code implementations • 21 Dec 2013 • David Warde-Farley, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio
The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters.
no code implementations • NeurIPS 2014 • Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have.
no code implementations • 19 May 2014 • Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
43 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).
Ranked #47 on Machine Translation on WMT2014 English-French
1 code implementation • 5 Jun 2014 • Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio
Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.
Ranked #7 on Image Generation on Binarized MNIST
183 code implementations • Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels
4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
2 code implementations • 11 Jun 2014 • Jörg Bornschein, Yoshua Bengio
The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.
no code implementations • 28 Jun 2014 • Kyunghyun Cho, Yoshua Bengio
Conditional computation has been proposed as a way to increase the capacity of a deep neural network without increasing the amount of computation required, by activating some parameters and computation "on-demand", on a per-example basis.
no code implementations • 29 Jul 2014 • Yoshua Bengio
We propose to exploit {\em reconstruction} as a layer-local training signal for deep learning.
121 code implementations • 1 Sep 2014 • Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
Neural machine translation is a recently proposed approach to machine translation.
Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)
no code implementations • 2 Sep 2014 • Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio
Orderless NADEs are trained based on a criterion that stochastically maximizes $P(\mathbf{x})$ with all possible orders of factorizations.
no code implementations • WS 2014 • Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio
The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems.
2 code implementations • 3 Sep 2014 • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio
In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.
no code implementations • 1 Oct 2014 • Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio
Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning.
no code implementations • 2 Oct 2014 • Sherjil Ozair, Yoshua Bengio
The objective is to learn an encoder $f(\cdot)$ that maps $X$ to $f(X)$ that has a much simpler distribution than $X$ itself, estimated by $P(H)$.
no code implementations • 2 Oct 2014 • Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio
Neural language models learn word representations that capture rich linguistic and conceptual information.
2 code implementations • 9 Oct 2014 • Stephan Gouws, Yoshua Bengio, Greg Corrado
We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
Ranked #1 on Document Classification on Reuters En-De
Cross-Lingual Document Classification Document Classification +3
19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio
It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.
Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)
3 code implementations • NeurIPS 2014 • Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson
Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks.
1 code implementation • NeurIPS 2014 • Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio
Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.
Ranked #8 on Image Generation on Binarized MNIST
1 code implementation • NeurIPS 2014 • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
no code implementations • 4 Dec 2014 • Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes.
1 code implementation • IJCNLP 2015 • Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.
13 code implementations • 11 Dec 2014 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).
Ranked #10 on Music Modeling on JSB Chorales
4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio
Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).
3 code implementations • 19 Dec 2014 • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio
In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.
no code implementations • 19 Dec 2014 • Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio
Here we investigate the embeddings learned by neural machine translation models, a recently-developed class of neural language model.
1 code implementation • 22 Dec 2014 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.
1 code implementation • 23 Dec 2014 • Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio
Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.
no code implementations • 23 Dec 2014 • Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio
The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.
no code implementations • 9 Feb 2015 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
In this work, we propose a novel recurrent neural network (RNN) architecture.
88 code implementations • 10 Feb 2015 • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio
Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.
no code implementations • 14 Feb 2015 • Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin
Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology.
2 code implementations • NeurIPS 2015 • Yann N. Dauphin, Harm de Vries, Yoshua Bengio
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.
no code implementations • 5 Mar 2015 • Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies.
no code implementations • 11 Mar 2015 • Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio
Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.
no code implementations • 18 Mar 2015 • Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng Zhang, Pascal Vincent
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.
2 code implementations • TACL 2016 • Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio
Distributional models that learn rich semantic word representations are a success story of recent NLP research.
4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio
In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.
Ranked #34 on Image Classification on MNIST
15 code implementations • 13 May 2015 • Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, Hugo Larochelle
Finally, we explore a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN.
Ranked #1 on Brain Tumor Segmentation on BRATS-2013 leaderboard
5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.
5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio
In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.
1 code implementation • 12 Jun 2015 • Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio
We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.
14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.
Ranked #17 on Speech Recognition on TIMIT
no code implementations • 4 Jul 2015 • Kyunghyun Cho, Aaron Courville, Yoshua Bengio
Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input.
4 code implementations • 8 Jul 2015 • Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie
Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.
7 code implementations • 17 Jul 2015 • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.
no code implementations • 21 Jul 2015 • Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua Bengio
Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes.
1 code implementation • 31 Jul 2015 • Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio
We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.
1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio
Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).
no code implementations • 19 Sep 2015 • Yoshua Bengio, Thomas Mesnard, Asja Fischer, Saizheng Zhang, Yuhuai Wu
We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed.
no code implementations • 5 Oct 2015 • César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio
Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies.
no code implementations • 9 Oct 2015 • Yoshua Bengio, Asja Fischer
We show that Langevin MCMC inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similarly to back-propagation.
2 code implementations • 11 Oct 2015 • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
For most deep learning algorithms training is notoriously time consuming.
5 code implementations • NeurIPS 2015 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.
Ranked #30 on Image Classification on SVHN
1 code implementation • 14 Nov 2015 • Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio
The task of associating images and videos with a natural language description has attracted a great amount of attention recently.
no code implementations • 19 Nov 2015 • Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio
Although the empirical results are impressive, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture.
no code implementations • 19 Nov 2015 • Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio
Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer that encourages this noise injection.
1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio
Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.
2 code implementations • 20 Nov 2015 • Martin Arjovsky, Amar Shah, Yoshua Bengio
When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.
Ranked #26 on Sequential Image Classification on Sequential MNIST
1 code implementation • 20 Nov 2015 • Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio
This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.
2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville
Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.
Ranked #18 on Semantic Segmentation on CamVid
no code implementations • NAACL 2016 • Orhan Firat, Kyunghyun Cho, Yoshua Bengio
We propose multi-way, multilingual neural machine translation.
26 code implementations • 9 Feb 2016 • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.
2 code implementations • 16 Feb 2016 • Benjamin Scellier, Yoshua Bengio
Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.
no code implementations • NeurIPS 2016 • Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio
In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs).
Ranked #23 on Language Modelling on Text8
1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio
Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).
2 code implementations • ACL 2016 • Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.
Ranked #3 on Machine Translation on WMT2015 English-German
1 code implementation • ACL 2016 • Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances.
no code implementations • ACL 2016 • Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio
At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.
1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang
Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.
9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio
Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.
no code implementations • 24 May 2016 • Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio
In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks.
4 code implementations • 2 Jun 2016 • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville
We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.
Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)
6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
We propose zoneout, a novel method for regularizing RNNs.
no code implementations • 6 Jun 2016 • Yoshua Bengio, Benjamin Scellier, Olexa Bilaniuk, Joao Sacramento, Walter Senn
We find conditions under which a simple feedforward computation is a very good initialization for inference, after the input units are clamped to observed values.
1 code implementation • 7 Jun 2016 • Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio
We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.
Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)
no code implementations • 10 Jun 2016 • Taesup Kim, Yoshua Bengio
Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training.
no code implementations • 18 Jun 2016 • Xu-Yao Zhang, Yoshua Bengio, Cheng-Lin Liu
Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective.
Data Augmentation Offline Handwritten Chinese Character Recognition
1 code implementation • 21 Jun 2016 • Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, Yoshua Bengio
In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters.
no code implementations • NeurIPS 2016 • Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov
We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs).
no code implementations • 30 Jun 2016 • Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio
We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.
Ranked #5 on Question Answering on bAbi
1 code implementation • 3 Jul 2016 • Heeyoul Choi, Kyunghyun Cho, Yoshua Bengio
Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence.
1 code implementation • 18 Jul 2016 • Mohammad Havaei, Nicolas Guizard, Nicolas Chapados, Yoshua Bengio
We introduce a deep learning image segmentation framework that is extremely robust to missing imaging modalities.
Ranked #98 on Semantic Segmentation on NYU Depth v2
3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).
Ranked #8 on Machine Translation on IWSLT2015 English-German
no code implementations • 1 Aug 2016 • Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, Yoshua Bengio
Current language models have a significant limitation in the ability to encode and decode factual knowledge.
no code implementations • 17 Aug 2016 • Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio
The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.
1 code implementation • 24 Aug 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets.
3 code implementations • 6 Sep 2016 • Junyoung Chung, Sungjin Ahn, Yoshua Bengio
Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.
Ranked #19 on Language Modelling on Text8
5 code implementations • 22 Sep 2016 • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.
1 code implementation • 5 Oct 2016 • Guillaume Alain, Yoshua Bengio
Neural network models have a reputation for being black boxes.
1 code implementation • NeurIPS 2016 • Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio
We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps.
1 code implementation • 21 Nov 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high.
no code implementations • 27 Nov 2016 • Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio
Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
22 code implementations • 28 Nov 2016 • Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio
State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs).
Ranked #9 on Semantic Segmentation on CamVid
5 code implementations • 28 Nov 2016 • Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).
1 code implementation • CVPR 2017 • Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.
no code implementations • 7 Dec 2016 • Tong Che, Yan-ran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li
Although Generative Adversarial Networks achieve state-of-the-art results on a variety of generative tasks, they are regarded as highly unstable and prone to miss modes.
no code implementations • 12 Dec 2016 • Mehdi Mirza, Aaron Courville, Yoshua Bengio
In this work, we explore the potential of unsupervised learning to find features that promote better generalization to settings outside the supervised training distribution.
no code implementations • 19 Dec 2016 • Mihir Mongia, Kundan Kumar, Akram Erraqabi, Yoshua Bengio
Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures.
4 code implementations • 22 Dec 2016 • Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
no code implementations • 30 Jan 2017 • Caglar Gulcehre, Sarath Chandar, Yoshua Bengio
We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.
no code implementations • 16 Feb 2017 • Michal Drozdzal, Gabriel Chartrand, Eugene Vorontsov, Lisa Di Jorio, An Tang, Adriana Romero, Yoshua Bengio, Chris Pal, Samuel Kadoury
Moreover, when applying our 2D pipeline on a challenging 3D MRI prostate segmentation challenge we reach results that are competitive even when compared to 3D methods.
no code implementations • 26 Feb 2017 • Tong Che, Yan-ran Li, Ruixiang Zhang, R. Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio
Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted.
6 code implementations • 27 Feb 2017 • R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio
We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.
1 code implementation • 2 Mar 2017 • Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio
The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.
52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.
no code implementations • ICML 2017 • Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio
Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice.
no code implementations • 22 Mar 2017 • Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio
Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images.
no code implementations • 23 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.
no code implementations • 24 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Improving distant speech recognition is a crucial step towards flexible human-machine interfaces.
2 code implementations • 25 Mar 2017 • Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio
Our contribution is redundant counting instead of predicting a density map in order to average over errors.
1 code implementation • ICLR 2018 • Adriana Romero, Michal Drozdzal, Akram Erraqabi, Simon Jégou, Yoshua Bengio
We experimentally find that the proposed iterative inference from conditional score estimation by conditional denoising autoencoders performs better than comparable models based on CRFs or those not using any explicit modeling of the conditional joint distribution of outputs.
9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal
Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.
Ranked #3 on Music Transcription on MusicNet
2 code implementations • 29 May 2017 • Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, Yoshua Bengio
An accurate model of patient-specific kidney graft survival distributions can help to improve shared-decision making in the treatment and care of patients.
no code implementations • ICLR 2018 • Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare.
Ranked #48 on Question Answering on SQuAD1.1 dev
1 code implementation • 8 Jun 2017 • Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio
We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.
Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)
1 code implementation • 13 Jun 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.
no code implementations • 14 Jun 2017 • Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler
We propose a two-stage neural model to tackle question generation from documents.
2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.
no code implementations • ICLR 2018 • Karan Grewal, R. Devon Hjelm, Yoshua Bengio
We hypothesize that this approach ensures a non-zero gradient to the generator, even in the limit of a perfect classifier.
no code implementations • 3 Jul 2017 • Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio
We propose a generalization of neural network sequence models.
no code implementations • 19 Jul 2017 • Taesup Kim, Inchul Song, Yoshua Bengio
Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the training speed and stability.
no code implementations • WS 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention.
no code implementations • 3 Aug 2017 • Valentin Thomas, Jules Pondard, Emmanuel Bengio, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio
It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation.
2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio
We propose a simple technique for encouraging generative RNNs to plan ahead.
1 code implementation • ACL 2017 • Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau
Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.
no code implementations • 7 Sep 2017 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio
By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble.
1 code implementation • 25 Sep 2017 • Yoshua Bengio
To the extent that these assumptions are generally true (and the form of natural language seems consistent with them), they can form a useful prior for representation learning.
1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.
no code implementations • ICLR 2018 • Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio
In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.
1 code implementation • ICLR 2018 • Philemon Brakel, Yoshua Bengio
We propose to learn independent features with adversarial objectives which optimize such measures implicitly.
no code implementations • 16 Oct 2017 • Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio
This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature.
1 code implementation • ICLR 2018 • Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio
To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure.
Ranked #3 on Visual Question Answering (VQA) on FigureQA - test 1
90 code implementations • ICLR 2018 • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.
Ranked #1 on Node Classification on Pubmed (Validation metric)
1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio
We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.
Ranked #28 on Language Modelling on Penn Treebank (Word Level)
no code implementations • 4 Nov 2017 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João F. Santos, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio
Singing voice separation based on deep learning relies on the usage of time-frequency masking.
Sound Audio and Speech Processing
no code implementations • ICLR 2018 • Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Laurent Charlin, Chris Pal, Yoshua Bengio
A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation.
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio
The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.
no code implementations • ICLR 2018 • Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey
In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.
no code implementations • 13 Nov 2017 • Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R. Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio
This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function.
no code implementations • ICLR 2018 • Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio
Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio
Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech.
1 code implementation • 22 Nov 2017 • Benjamin Scellier, Yoshua Bengio
Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.
1 code implementation • NeurIPS 2017 • Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into sequence-to-sequence models using attention.
1 code implementation • 30 Nov 2017 • Jason Jo, Yoshua Bengio
The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset.
1 code implementation • 6 Dec 2017 • Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson, Yoshua Bengio
We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text.
no code implementations • NeurIPS 2017 • Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio
Directed latent variable models that formulate the joint distribution as $p(x, z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling.
1 code implementation • 30 Dec 2017 • João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn
Animal behaviour depends on learning to associate sensory stimuli with the desired motor command.
no code implementations • ICLR 2018 • Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio
The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists.
no code implementations • ICLR 2018 • R. Devon Hjelm, Athul Paul Jacob, Adam Trischler, Gerry Che, Kyunghyun Cho, Yoshua Bengio
We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.
no code implementations • ICLR 2018 • Brady Neal, Alex Lamb, Sherjil Ozair, Devon Hjelm, Aaron Courville, Yoshua Bengio, Ioannis Mitliagkas
One of the most successful techniques in generative models has been decomposing a complicated generation task into a series of simpler generation tasks.
no code implementations • ICLR 2018 • Tong Che, Yuchen Lu, George Tucker, Surya Bhupatiraju, Shane Gu, Sergey Levine, Yoshua Bengio
Model-free deep reinforcement learning algorithms are able to successfully solve a wide range of continuous control tasks, but typically require many on-policy samples to achieve good performance.