Search Results for author: Yoshua Bengio

Found 576 papers, 295 papers with code

Learning the 2-D Topology of Images

no code implementations NeurIPS 2007 Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl

We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images?

Topmoumoute Online Natural Gradient Algorithm

no code implementations NeurIPS 2007 Nicolas L. Roux, Pierre-Antoine Manzagol, Yoshua Bengio

Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error.

Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

no code implementations NeurIPS 2009 Yoshua Bengio, James S. Bergstra

We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1.

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

no code implementations NeurIPS 2009 Douglas Eck, Yoshua Bengio, Aaron C. Courville

The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an infinite number of latent factors.

TAG

Understanding the difficulty of training deep feedforward neural networks

no code implementations13 May 2010 Xavier Glorot, Yoshua Bengio

Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures.

Adaptive Drift-Diffusion Process to Learn Time Intervals

1 code implementation11 Mar 2011 Francois Rivest, Yoshua Bengio

We provide an analytical proof that the model can learn inter-event intervals in a number of trials independent of the interval size and that the temporal precision of the system is proportional to the timed interval.

Algorithms for Hyper-Parameter Optimization

no code implementations NeurIPS 2011 James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl

Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs.

Image Classification

On Tracking The Partition Function

no code implementations NeurIPS 2011 Guillaume Desjardins, Yoshua Bengio, Aaron C. Courville

In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to {\bf track} the log partition function during learning.

Shallow vs. Deep Sum-Product Networks

no code implementations NeurIPS 2011 Olivier Delalleau, Yoshua Bengio

We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures.

Representation Learning: A Review and New Perspectives

5 code implementations24 Jun 2012 Yoshua Bengio, Aaron Courville, Pascal Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.

Density Estimation Representation Learning

Practical recommendations for gradient-based training of deep architectures

14 code implementations24 Jun 2012 Yoshua Bengio

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.

Better Mixing via Deep Representations

no code implementations18 Jul 2012 Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai

It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.

Efficient EM Training of Gaussian Mixtures with Missing Data

1 code implementation4 Sep 2012 Olivier Delalleau, Aaron Courville, Yoshua Bengio

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms.

Disentangling Factors of Variation via Generative Entangling

no code implementations19 Oct 2012 Guillaume Desjardins, Aaron Courville, Yoshua Bengio

Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation.

General Classification

What Regularized Auto-Encoders Learn from the Data Generating Distribution

no code implementations18 Nov 2012 Guillaume Alain, Yoshua Bengio

This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density.

Denoising

On the difficulty of training Recurrent Neural Networks

no code implementations21 Nov 2012 Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994).

Theano: new features and speed improvements

no code implementations23 Nov 2012 Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio

Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations.

BIG-bench Machine Learning

Advances in Optimizing Recurrent Networks

no code implementations4 Dec 2012 Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu

After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks.

A Semantic Matching Energy Function for Learning with Multi-relational Data

no code implementations15 Jan 2013 Xavier Glorot, Antoine Bordes, Jason Weston, Yoshua Bengio

Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing.

Information Retrieval Link Prediction +2

Revisiting Natural Gradient for Deep Networks

no code implementations16 Jan 2013 Razvan Pascanu, Yoshua Bengio

We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models.

Knowledge Matters: Importance of Prior Information for Optimization

1 code implementation17 Jan 2013 Çağlar Gülçehre, Yoshua Bengio

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn.

Unsupervised Pre-training

Maxout Networks

7 code implementations18 Feb 2013 Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.

General Classification Image Classification

Deep Learning of Representations: Looking Forward

no code implementations2 May 2013 Yoshua Bengio

Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts.

Estimating or Propagating Gradients Through Stochastic Neurons

no code implementations14 May 2013 Yoshua Bengio

The second approach we propose assumes that an estimator of the gradient can be back-propagated and it provides an unbiased estimator of the gradient, but can only work with non-linearities unlike the hard threshold, but like the rectifier, that are not flat for all of their range.

Generalized Denoising Auto-Encoders as Generative Models

1 code implementation NeurIPS 2013 Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent

Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued.

Denoising valid

Deep Generative Stochastic Networks Trainable by Backprop

3 code implementations5 Jun 2013 Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

1 code implementation15 Aug 2013 Yoshua Bengio, Nicholas Léonard, Aaron Courville

Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons?

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

no code implementations7 Nov 2013 Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio

In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.

Object Recognition

Bounding the Test Log-Likelihood of Generative Models

no code implementations24 Nov 2013 Yoshua Bengio, Li Yao, Kyunghyun Cho

Several interesting generative learning algorithms involve a complex probability distribution over many random variables, involving intractable normalization constants or latent variable normalization.

Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs

no code implementations NeurIPS 2013 Yann Dauphin, Yoshua Bengio

Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised.

text-classification Text Classification +1

On the Challenges of Physical Implementations of RBMs

no code implementations18 Dec 2013 Vincent Dumoulin, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC.

Multimodal Transitions for Generative Stochastic Networks

no code implementations19 Dec 2013 Sherjil Ozair, Li Yao, Yoshua Bengio

Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution.

On the number of response regions of deep feed forward networks with piece-wise linear activations

no code implementations20 Dec 2013 Razvan Pascanu, Guido Montufar, Yoshua Bengio

For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$.

How to Construct Deep Recurrent Neural Networks

no code implementations20 Dec 2013 Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).

Language Modelling

An empirical analysis of dropout in piecewise linear networks

no code implementations21 Dec 2013 David Warde-Farley, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters.

On the Number of Linear Regions of Deep Neural Networks

no code implementations NeurIPS 2014 Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have.

On the saddle point problem for non-convex optimization

no code implementations19 May 2014 Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

1 code implementation5 Jun 2014 Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Density Estimation Image Generation +1

Generative Adversarial Networks

183 code implementations Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Reweighted Wake-Sleep

2 code implementations11 Jun 2014 Jörg Bornschein, Yoshua Bengio

The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.

Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

no code implementations28 Jun 2014 Kyunghyun Cho, Yoshua Bengio

Conditional computation has been proposed as a way to increase the capacity of a deep neural network without increasing the amount of computation required, by activating some parameters and computation "on-demand", on a per-example basis.

How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

no code implementations29 Jul 2014 Yoshua Bengio

We propose to exploit {\em reconstruction} as a layer-local training signal for deep learning.

On the Equivalence Between Deep NADE and Generative Stochastic Networks

no code implementations2 Sep 2014 Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio

Orderless NADEs are trained based on a criterion that stochastically maximizes $P(\mathbf{x})$ with all possible orders of factorizations.

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

no code implementations WS 2014 Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio

The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems.

Machine Translation Sentence +1

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

2 code implementations3 Sep 2014 Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.

Decoder Machine Translation +2

Deep Tempering

no code implementations1 Oct 2014 Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio

Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning.

Deep Directed Generative Autoencoders

no code implementations2 Oct 2014 Sherjil Ozair, Yoshua Bengio

The objective is to learn an encoder $f(\cdot)$ that maps $X$ to $f(X)$ that has a much simpler distribution than $X$ itself, estimated by $P(H)$.

Decoder

Not All Neural Embeddings are Born Equal

no code implementations2 Oct 2014 Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio

Neural language models learn word representations that capture rich linguistic and conceptual information.

Machine Translation Translation

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

2 code implementations9 Oct 2014 Stephan Gouws, Yoshua Bengio, Greg Corrado

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Cross-Lingual Document Classification Document Classification +3

NICE: Non-linear Independent Components Estimation

19 code implementations30 Oct 2014 Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

How transferable are features in deep neural networks?

3 code implementations NeurIPS 2014 Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson

Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks.

Specificity

Iterative Neural Autoregressive Distribution Estimator NADE-k

1 code implementation NeurIPS 2014 Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Density Estimation Image Generation +1

Generative Adversarial Nets

1 code implementation NeurIPS 2014 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

no code implementations4 Dec 2014 Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes.

Decoder speech-recognition +1

On Using Very Large Target Vocabulary for Neural Machine Translation

1 code implementation IJCNLP 2015 Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.

Machine Translation Translation

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations17 Dec 2014 Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

FitNets: Hints for Thin Deep Nets

3 code implementations19 Dec 2014 Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Knowledge Distillation

Embedding Word Similarity with Neural Machine Translation

no code implementations19 Dec 2014 Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, Yoshua Bengio

Here we investigate the embeddings learned by neural machine translation models, a recently-developed class of neural language model.

Language Modelling Machine Translation +2

Training deep neural networks with low precision multiplications

1 code implementation22 Dec 2014 Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.

Difference Target Propagation

1 code implementation23 Dec 2014 Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

no code implementations23 Dec 2014 Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.

Gated Feedback Recurrent Neural Networks

no code implementations9 Feb 2015 Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

In this work, we propose a novel recurrent neural network (RNN) architecture.

Language Modelling

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

88 code implementations10 Feb 2015 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

Towards Biologically Plausible Deep Learning

no code implementations14 Feb 2015 Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin

Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology.

Denoising Representation Learning

Equilibrated adaptive learning rates for non-convex optimization

2 code implementations NeurIPS 2015 Yann N. Dauphin, Harm de Vries, Yoshua Bengio

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.

On Using Monolingual Corpora in Neural Machine Translation

no code implementations11 Mar 2015 Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

Machine Translation Translation

GSNs : Generative Stochastic Networks

no code implementations18 Mar 2015 Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng Zhang, Pascal Vincent

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

Denoising

Learning to Understand Phrases by Embedding the Dictionary

2 code implementations TACL 2016 Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio

Distributional models that learn rich semantic word representations are a success story of recent NLP research.

General Knowledge

A Recurrent Latent Variable Model for Sequential Data

5 code implementations NeurIPS 2015 Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

Bidirectional Helmholtz Machines

1 code implementation12 Jun 2015 Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio

We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.

Attention-Based Models for Speech Recognition

14 code implementations NeurIPS 2015 Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Machine Translation Speech Recognition +1

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

no code implementations4 Jul 2015 Kyunghyun Cho, Aaron Courville, Yoshua Bengio

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input.

Caption Generation Decoder +4

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

4 code implementations8 Jul 2015 Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

Decoder

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

7 code implementations17 Jul 2015 Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.

Decoder Word Embeddings

Clustering is Efficient for Approximate Maximum Inner Product Search

no code implementations21 Jul 2015 Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua Bengio

Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes.

Clustering Recommendation Systems +2

Artificial Neural Networks Applied to Taxi Destination Prediction

1 code implementation31 Jul 2015 Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio

We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation18 Aug 2015 Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

STDP as presynaptic activity times rate of change of postsynaptic activity

no code implementations19 Sep 2015 Yoshua Bengio, Thomas Mesnard, Asja Fischer, Saizheng Zhang, Yuhuai Wu

We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed.

Batch Normalized Recurrent Neural Networks

no code implementations5 Oct 2015 César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio

Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies.

Language Modelling speech-recognition +1

Early Inference in Energy-Based Models Approximates Back-Propagation

no code implementations9 Oct 2015 Yoshua Bengio, Asja Fischer

We show that Langevin MCMC inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similarly to back-propagation.

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

5 code implementations NeurIPS 2015 Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.

Oracle performance for visual captioning

1 code implementation14 Nov 2015 Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

The task of associating images and videos with a natural language description has attracted a great amount of attention recently.

Image Captioning Language Modelling +1

Deconstructing the Ladder Network Architecture

no code implementations19 Nov 2015 Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

Although the empirical results are impressive, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture.

Denoising

Denoising Criterion for Variational Auto-Encoding Framework

no code implementations19 Nov 2015 Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio

Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer that encourages this noise injection.

Denoising

Task Loss Estimation for Sequence Prediction

1 code implementation19 Nov 2015 Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Decoder Language Modelling +2

Unitary Evolution Recurrent Neural Networks

2 code implementations20 Nov 2015 Martin Arjovsky, Amar Shah, Yoshua Bengio

When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.

Sequential Image Classification

Variance Reduction in SGD by Distributed Importance Sampling

1 code implementation20 Nov 2015 Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio

This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

26 code implementations9 Feb 2016 Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation

2 code implementations16 Feb 2016 Benjamin Scellier, Yoshua Bengio

Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.

Noisy Activation Functions

1 code implementation1 Mar 2016 Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

2 code implementations ACL 2016 Junyoung Chung, Kyunghyun Cho, Yoshua Bengio

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.

Decoder Machine Translation +2

Pointing the Unknown Words

no code implementations ACL 2016 Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio

At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.

Machine Translation Sentence +2

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation9 May 2016 The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

9 code implementations19 May 2016 Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

Decoder Response Generation

Hierarchical Memory Networks

no code implementations24 May 2016 Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio

In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks.

Hard Attention Question Answering

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

4 code implementations2 Jun 2016 Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.

Dialogue Generation Response Generation

Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible

no code implementations6 Jun 2016 Yoshua Bengio, Benjamin Scellier, Olexa Bilaniuk, Joao Sacramento, Walter Senn

We find conditions under which a simple feedforward computation is a very good initialization for inference, after the input units are clamped to observed values.

Iterative Alternating Neural Attention for Machine Reading

1 code implementation7 Jun 2016 Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.

Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)

Question Answering Reading Comprehension

Deep Directed Generative Models with Energy-Based Probability Estimation

no code implementations10 Jun 2016 Taesup Kim, Yoshua Bengio

Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training.

Online and Offline Handwritten Chinese Character Recognition: A Comprehensive Study and New Benchmark

no code implementations18 Jun 2016 Xu-Yao Zhang, Yoshua Bengio, Cheng-Lin Liu

Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective.

Data Augmentation Offline Handwritten Chinese Character Recognition

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

1 code implementation21 Jun 2016 Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, Yoshua Bengio

In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters.

Handwriting Recognition

On Multiplicative Integration with Recurrent Neural Networks

no code implementations NeurIPS 2016 Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs).

Language Modelling

Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

no code implementations30 Jun 2016 Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio

We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.

Natural Language Inference Question Answering

Context-Dependent Word Representation for Neural Machine Translation

1 code implementation3 Jul 2016 Heeyoul Choi, Kyunghyun Cho, Yoshua Bengio

Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence.

Decoder Machine Translation +2

HeMIS: Hetero-Modal Image Segmentation

1 code implementation18 Jul 2016 Mohammad Havaei, Nicolas Guizard, Nicolas Chapados, Yoshua Bengio

We introduce a deep learning image segmentation framework that is extremely robust to missing imaging modalities.

Image Segmentation Imputation +2

A Neural Knowledge Language Model

no code implementations1 Aug 2016 Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, Yoshua Bengio

Current language models have a significant limitation in the ability to encode and decode factual knowledge.

Language Modelling

Mollifying Networks

no code implementations17 Aug 2016 Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.

Recurrent Neural Networks With Limited Numerical Precision

1 code implementation24 Aug 2016 Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio

We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets.

Binarization

Hierarchical Multiscale Recurrent Neural Networks

3 code implementations6 Sep 2016 Junyoung Chung, Sungjin Ahn, Yoshua Bengio

Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.

Language Modelling

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

5 code implementations22 Sep 2016 Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.

Professor Forcing: A New Algorithm for Training Recurrent Networks

1 code implementation NeurIPS 2016 Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio

We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps.

Domain Adaptation Handwriting generation +2

Recurrent Neural Networks With Limited Numerical Precision

1 code implementation21 Nov 2016 Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio

Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high.

Quantization

Invariant Representations for Noisy Speech Recognition

no code implementations27 Nov 2016 Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Diet Networks: Thin Parameters for Fat Genomics

5 code implementations28 Nov 2016 Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio

It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).

Parameter Prediction

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

1 code implementation CVPR 2017 Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.

Image Captioning Image Inpainting

Mode Regularized Generative Adversarial Networks

no code implementations7 Dec 2016 Tong Che, Yan-ran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li

Although Generative Adversarial Networks achieve state-of-the-art results on a variety of generative tasks, they are regarded as highly unstable and prone to miss modes.

Generalizable Features From Unsupervised Learning

no code implementations12 Dec 2016 Mehdi Mirza, Aaron Courville, Yoshua Bengio

In this work, we explore the potential of unsupervised learning to find features that promote better generalization to settings outside the supervised training distribution.

Physical Intuition

On Random Weights for Texture Generation in One Layer Neural Networks

no code implementations19 Dec 2016 Mihir Mongia, Kundan Kumar, Akram Erraqabi, Yoshua Bengio

Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures.

Texture Synthesis

Memory Augmented Neural Networks with Wormhole Connections

no code implementations30 Jan 2017 Caglar Gulcehre, Sarath Chandar, Yoshua Bengio

We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.

Learning Normalized Inputs for Iterative Estimation in Medical Image Segmentation

no code implementations16 Feb 2017 Michal Drozdzal, Gabriel Chartrand, Eugene Vorontsov, Lisa Di Jorio, An Tang, Adriana Romero, Yoshua Bengio, Chris Pal, Samuel Kadoury

Moreover, when applying our 2D pipeline on a challenging 3D MRI prostate segmentation challenge we reach results that are competitive even when compared to 3D methods.

Image Segmentation Medical Image Segmentation +2

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

no code implementations26 Feb 2017 Tong Che, Yan-ran Li, Ruixiang Zhang, R. Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio

Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted.

Boundary-Seeking Generative Adversarial Networks

6 code implementations27 Feb 2017 R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.

Scene Understanding Text Generation

A Robust Adaptive Stochastic Gradient Method for Deep Learning

1 code implementation2 Mar 2017 Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.

Sharp Minima Can Generalize For Deep Nets

no code implementations ICML 2017 Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice.

Independently Controllable Features

no code implementations22 Mar 2017 Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio

Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images.

A network of deep neural networks for distant speech recognition

no code implementations23 Mar 2017 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

Distant Speech Recognition Speech Enhancement +1

Image Segmentation by Iterative Inference from Conditional Score Estimation

1 code implementation ICLR 2018 Adriana Romero, Michal Drozdzal, Akram Erraqabi, Simon Jégou, Yoshua Bengio

We experimentally find that the proposed iterative inference from conditional score estimation by conditional denoising autoencoders performs better than comparable models based on CRFs or those not using any explicit modeling of the conditional joint distribution of outputs.

Denoising Image Segmentation +1

Deep Complex Networks

9 code implementations ICLR 2018 Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Image Classification Music Transcription +1

Deep Learning for Patient-Specific Kidney Graft Survival Analysis

2 code implementations29 May 2017 Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, Yoshua Bengio

An accurate model of patient-specific kidney graft survival distributions can help to improve shared-decision making in the treatment and care of patients.

Decision Making Multi-Task Learning +1

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation8 Jun 2017 Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation13 Jun 2017 Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Decoder Machine Translation +1

Variance Regularizing Adversarial Learning

no code implementations ICLR 2018 Karan Grewal, R. Devon Hjelm, Yoshua Bengio

We hypothesize that this approach ensures a non-zero gradient to the generator, even in the limit of a perfect classifier.

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

no code implementations19 Jul 2017 Taesup Kim, Inchul Song, Yoshua Bengio

Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the training speed and stability.

speech-recognition Speech Recognition

Independently Controllable Factors

no code implementations3 Aug 2017 Valentin Thomas, Jules Pondard, Emmanuel Bengio, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation.

Open-Ended Question Answering

The Consciousness Prior

1 code implementation25 Sep 2017 Yoshua Bengio

To the extent that these assumptions are generally true (and the form of natural language seems consistent with them), they can form a useful prior for representation learning.

Decision Making Representation Learning +1

Improving speech recognition by revising gated recurrent units

1 code implementation29 Sep 2017 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.

speech-recognition Speech Recognition

Residual Connections Encourage Iterative Inference

no code implementations ICLR 2018 Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.

Representation Learning

Learning Independent Features with Adversarial Nets for Non-linear ICA

1 code implementation ICLR 2018 Philemon Brakel, Yoshua Bengio

We propose to learn independent features with adversarial objectives which optimize such measures implicitly.

Generalization in Deep Learning

no code implementations16 Oct 2017 Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature.

Open-Ended Question Answering

Graph Attention Networks

90 code implementations ICLR 2018 Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

 Ranked #1 on Node Classification on Pubmed (Validation metric)

Document Classification Graph Attention +8

Fraternal Dropout

1 code implementation ICLR 2018 Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Image Captioning Language Modelling

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

no code implementations ICLR 2018 Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Laurent Charlin, Chris Pal, Yoshua Bengio

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation.

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation NeurIPS 2017 Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Three Factors Influencing Minima in SGD

no code implementations ICLR 2018 Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.

Memorization Open-Ended Question Answering

ACtuAL: Actor-Critic Under Adversarial Learning

no code implementations13 Nov 2017 Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R. Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio

This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function.

Language Modelling

Variational Bi-LSTMs

no code implementations ICLR 2018 Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.

Equivalence of Equilibrium Propagation and Recurrent Backpropagation

1 code implementation22 Nov 2017 Benjamin Scellier, Yoshua Bengio

Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.

Measuring the tendency of CNNs to Learn Surface Statistical Regularities

1 code implementation30 Nov 2017 Jason Jo, Yoshua Bengio

The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset.

ObamaNet: Photo-realistic lip-sync from text

1 code implementation6 Dec 2017 Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson, Yoshua Bengio

We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text.

Constrained Lip-synchronization

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

no code implementations NeurIPS 2017 Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Directed latent variable models that formulate the joint distribution as $p(x, z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling.

Attribute

Dendritic error backpropagation in deep cortical microcircuits

1 code implementation30 Dec 2017 João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command.

Denoising

Boundary Seeking GANs

no code implementations ICLR 2018 R. Devon Hjelm, Athul Paul Jacob, Adam Trischler, Gerry Che, Kyunghyun Cho, Yoshua Bengio

We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.

Scene Understanding Text Generation

Learning Generative Models with Locally Disentangled Latent Factors

no code implementations ICLR 2018 Brady Neal, Alex Lamb, Sherjil Ozair, Devon Hjelm, Aaron Courville, Yoshua Bengio, Ioannis Mitliagkas

One of the most successful techniques in generative models has been decomposing a complicated generation task into a series of simpler generation tasks.

Combining Model-based and Model-free RL via Multi-step Control Variates

no code implementations ICLR 2018 Tong Che, Yuchen Lu, George Tucker, Surya Bhupatiraju, Shane Gu, Sergey Levine, Yoshua Bengio

Model-free deep reinforcement learning algorithms are able to successfully solve a wide range of continuous control tasks, but typically require many on-policy samples to achieve good performance.

Continuous Control OpenAI Gym

Cannot find the paper you are looking for? You can Submit a new open access paper.