Search Results for author: Yoshua Bengio

Found 576 papers, 295 papers with code

Practical recommendations for gradient-based training of deep architectures

14 code implementations24 Jun 2012 Yoshua Bengio

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.

Graph Attention Networks

90 code implementations ICLR 2018 Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

 Ranked #1 on Node Classification on Pubmed (Validation metric)

Document Classification Graph Attention +8

Generative Adversarial Networks

183 code implementations Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels

On Catastrophic Interference in Atari 2600 Games

1 code implementation28 Feb 2020 William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

Atari Games reinforcement-learning +1

Revisiting Fundamentals of Experience Replay

2 code implementations ICML 2020 William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning +1

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

21 code implementations NeurIPS 2019 Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville

In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.

Speech Synthesis Translation

Boundary-Seeking Generative Adversarial Networks

6 code implementations27 Feb 2017 R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.

Scene Understanding Text Generation

Deep Graph Infomax

11 code implementations ICLR 2019 Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R. Devon Hjelm

We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner.

General Classification Node Classification

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation9 May 2016 The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

Compositional Attention: Disentangling Search and Retrieval

3 code implementations ICLR 2022 Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

5 code implementations NeurIPS 2015 Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

26 code implementations9 Feb 2016 Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

88 code implementations10 Feb 2015 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

2 code implementations30 May 2022 Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors.

Decision Making Inductive Bias

Unsupervised State Representation Learning in Atari

6 code implementations NeurIPS 2019 Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R. Devon Hjelm

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks.

Atari Games Representation Learning

Benchmarking Graph Neural Networks

17 code implementations2 Mar 2020 Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson

In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.

Benchmarking Graph Classification +3

Quaternion Recurrent Neural Networks

3 code implementations ICLR 2019 Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Twin Regularization for online speech recognition

2 code implementations15 Apr 2018 Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Online speech recognition is crucial for developing natural human-machine interfaces.

speech-recognition Speech Recognition

The PyTorch-Kaldi Speech Recognition Toolkit

11 code implementations19 Nov 2018 Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio

Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

Distant Speech Recognition Noisy Speech Recognition

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

4 code implementations ICLR 2018 Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J. Pal

In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.

Multi-Task Learning Natural Language Inference +2

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

6 code implementations ICLR 2019 Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.

Grounded language learning

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

1 code implementation ICLR 2020 Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent.

reinforcement-learning Reinforcement Learning (RL) +1

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

5 code implementations22 Sep 2016 Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.

Torchmeta: A Meta-Learning library for PyTorch

5 code implementations14 Sep 2019 Tristan Deleu, Tobias Würfl, Mandana Samiei, Joseph Paul Cohen, Yoshua Bengio

The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research.

Meta-Learning

Gradient based sample selection for online continual learning

4 code implementations NeurIPS 2019 Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio

To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal.

Class Incremental Learning

On Using Very Large Target Vocabulary for Neural Machine Translation

1 code implementation IJCNLP 2015 Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.

Machine Translation Translation

FitNets: Hints for Thin Deep Nets

3 code implementations19 Dec 2014 Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Knowledge Distillation

Attention-Based Models for Speech Recognition

14 code implementations NeurIPS 2015 Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Machine Translation Speech Recognition +1

Speaker Recognition from Raw Waveform with SincNet

26 code implementations29 Jul 2018 Mirco Ravanelli, Yoshua Bengio

Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.

Speaker Identification Speaker Recognition +1

Speech and Speaker Recognition from Raw Waveform with SincNet

2 code implementations13 Dec 2018 Mirco Ravanelli, Yoshua Bengio

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.

Inductive Bias Speaker Recognition +2

Hyena Hierarchy: Towards Larger Convolutional Language Models

5 code implementations21 Feb 2023 Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.

2k 8k +2

Improving and generalizing flow-based generative models with minibatch optimal transport

2 code implementations1 Feb 2023 Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, Yoshua Bengio

CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models.

Simulation-free Schrödinger bridges via score and flow matching

1 code implementation7 Jul 2023 Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio

We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired samples drawn from arbitrary source and target distributions.

Deep Complex Networks

9 code implementations ICLR 2018 Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Image Classification Music Transcription +1

Avoidance Learning Using Observational Reinforcement Learning

1 code implementation24 Sep 2019 David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup

We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator.

Imitation Learning reinforcement-learning +1

BabyAI 1.1

3 code implementations24 Jul 2020 David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.

Computational Efficiency Imitation Learning +2

NICE: Non-linear Independent Components Estimation

19 code implementations30 Oct 2014 Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

Maxout Networks

7 code implementations18 Feb 2013 Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.

General Classification Image Classification

Manifold Mixup: Better Representations by Interpolating Hidden States

12 code implementations ICLR 2019 Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples.

Image Classification

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

4 code implementations NeurIPS 2021 Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio

Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.

Trajectory balance: Improved credit assignment in GFlowNets

3 code implementations31 Jan 2022 Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object.

Learning GFlowNets from partial episodes for improved convergence and stability

3 code implementations26 Sep 2022 Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, Emmanuel Bengio, Moksh Jain, Andrei Nica, Tom Bosc, Yoshua Bengio, Nikolay Malkin

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks.

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

1 code implementation CVPR 2017 Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.

Image Captioning Image Inpainting

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

2 code implementations NeurIPS 2023 Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré

Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models.

4k In-Context Learning +2

Combined Reinforcement Learning via Abstract Representations

1 code implementation12 Sep 2018 Vincent François-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages.

reinforcement-learning Reinforcement Learning (RL) +1

Noisy Activation Functions

1 code implementation1 Mar 2016 Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

Manifold Mixup: Learning Better Representations by Interpolating Hidden States

1 code implementation ICLR 2019 Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Aaron Courville, Ioannis Mitliagkas, Yoshua Bengio

Because the hidden states are learned, this has an important effect of encouraging the hidden states for a class to be concentrated in such a way so that interpolations within the same class or between two different classes do not intersect with the real data points from other classes.

Iterative Alternating Neural Attention for Machine Reading

1 code implementation7 Jun 2016 Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.

Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)

Question Answering Reading Comprehension

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation6 Apr 2019 Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Distant Speech Recognition

Multi-task self-supervised learning for Robust Speech Recognition

1 code implementation25 Jan 2020 Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.

Robust Speech Recognition Self-Supervised Learning +1

GMNN: Graph Markov Neural Networks

1 code implementation15 May 2019 Meng Qu, Yoshua Bengio, Jian Tang

Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training.

Classification General Classification +3

Generative Adversarial Nets

1 code implementation NeurIPS 2014 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

Deep Generative Stochastic Networks Trainable by Backprop

3 code implementations5 Jun 2013 Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

Variance Reduction in SGD by Distributed Importance Sampling

1 code implementation20 Nov 2015 Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio

This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.

MINE: Mutual Information Neural Estimation

21 code implementations12 Jan 2018 Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

9 code implementations19 May 2016 Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

Decoder Response Generation

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

4 code implementations2 Jun 2016 Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.

Dialogue Generation Response Generation

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

7 code implementations17 Jul 2015 Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.

Decoder Word Embeddings

A Recurrent Latent Variable Model for Sequential Data

5 code implementations NeurIPS 2015 Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

Oracle performance for visual captioning

1 code implementation14 Nov 2015 Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

The task of associating images and videos with a natural language description has attracted a great amount of attention recently.

Image Captioning Language Modelling +1

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation18 Aug 2015 Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

Task Loss Estimation for Sequence Prediction

1 code implementation19 Nov 2015 Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Decoder Language Modelling +2

Artificial Neural Networks Applied to Taxi Destination Prediction

1 code implementation31 Jul 2015 Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio

We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations17 Dec 2014 Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

Speech Model Pre-training for End-to-End Spoken Language Understanding

1 code implementation7 Apr 2019 Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.

Ranked #15 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Meta-learning framework with applications to zero-shot time-series forecasting

3 code implementations7 Feb 2020 Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets?

Meta-Learning Time Series +1

GFlowNet Foundations

2 code implementations17 Nov 2021 Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.

Active Learning

torchgfn: A PyTorch GFlowNet library

2 code implementations24 May 2023 Salem Lahlou, Joseph D. Viviano, Victor Schmidt, Yoshua Bengio

The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments.

Chunked Autoregressive GAN for Conditional Waveform Synthesis

1 code implementation ICLR 2022 Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.

Inductive Bias

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

4 code implementations8 Jul 2015 Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

Decoder

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation13 Jun 2017 Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Decoder Machine Translation +1

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

2 code implementations ACL 2016 Junyoung Chung, Kyunghyun Cho, Yoshua Bengio

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.

Decoder Machine Translation +2

An Empirical Study of Example Forgetting during Deep Neural Network Learning

3 code implementations ICLR 2019 Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.

General Classification

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

2 code implementations3 Sep 2014 Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.

Decoder Machine Translation +2

Maximum Entropy Generators for Energy-Based Models

2 code implementations24 Jan 2019 Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio

Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.

Anomaly Detection

Interpolation Consistency Training for Semi-Supervised Learning

4 code implementations9 Mar 2019 Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.

General Classification Semi-Supervised Image Classification

Hierarchical Multiscale Recurrent Neural Networks

3 code implementations6 Sep 2016 Junyoung Chung, Sungjin Ahn, Yoshua Bengio

Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.

Language Modelling

Multi-Fidelity Active Learning with GFlowNets

2 code implementations20 Jun 2023 Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio

For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive.

Active Learning

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

2 code implementations ICLR 2021 Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Bengio, Jian Tang

Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step.

Knowledge Graphs

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

2 code implementations9 Oct 2014 Stephan Gouws, Yoshua Bengio, Greg Corrado

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Cross-Lingual Document Classification Document Classification +3

Recurrent Independent Mechanisms

3 code implementations ICLR 2021 Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations1 Feb 2018 Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

Equivalence of Equilibrium Propagation and Recurrent Backpropagation

1 code implementation22 Nov 2017 Benjamin Scellier, Yoshua Bengio

Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation

2 code implementations16 Feb 2016 Benjamin Scellier, Yoshua Bengio

Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.

Generalization of Equilibrium Propagation to Vector Field Dynamics

3 code implementations14 Aug 2018 Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio

The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists.

Learning to Understand Phrases by Embedding the Dictionary

2 code implementations TACL 2016 Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio

Distributional models that learn rich semantic word representations are a success story of recent NLP research.

General Knowledge

Representation Learning: A Review and New Perspectives

5 code implementations24 Jun 2012 Yoshua Bengio, Aaron Courville, Pascal Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.

Density Estimation Representation Learning

Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization

1 code implementation ICCV 2019 Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang

Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization.

domain classification Image-to-Image Translation +1

Learning Neural Causal Models from Unknown Interventions

2 code implementations2 Oct 2019 Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.

Meta-Learning

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

2 code implementations11 Mar 2024 Minsu Kim, Sanghyeok Choi, Jiwoo Son, Hyeonah Kim, Jinkyoo Park, Yoshua Bengio

This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a novel neural-guided meta-heuristic algorithm for combinatorial optimization.

Combinatorial Optimization

On the Spectral Bias of Neural Networks

2 code implementations ICLR 2019 Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

Image-to-image translation for cross-domain disentanglement

1 code implementation NeurIPS 2018 Abel Gonzalez-Garcia, Joost Van de Weijer, Yoshua Bengio

We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.

Disentanglement Image-to-Image Translation +2

Amortizing intractable inference in large language models

1 code implementation6 Oct 2023 Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions.

Bayesian Inference

Towards Gene Expression Convolutions using Gene Interaction Graphs

1 code implementation18 Jun 2018 Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio

We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used.

Generative Flow Networks for Discrete Probabilistic Modeling

2 code implementations3 Feb 2022 Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data.

Bayesian Structure Learning with Generative Flow Networks

1 code implementation28 Feb 2022 Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data.

Variational Inference

A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

1 code implementation12 Dec 2023 Alexandre Duval, Simon V. Mathis, Chaitanya K. Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D. Malliaros, Taco Cohen, Pietro Liò, Yoshua Bengio, Michael Bronstein

In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations.

Protein Structure Prediction Specificity

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

1 code implementation25 Sep 2019 Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.

Generalization Bounds Graph Attention +1

Biological Sequence Design with GFlowNets

1 code implementation2 Mar 2022 Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio

In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round.

Active Learning

Fraternal Dropout

1 code implementation ICLR 2018 Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Image Captioning Language Modelling

Training deep neural networks with low precision multiplications

1 code implementation22 Dec 2014 Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.

Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning

1 code implementation ICML 2020 Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Hao-Ran Wei, Yashaswi Pathak, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

In this work, we propose a novel reinforcement learning (RL) setup for drug discovery that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo compound design system.

Drug Discovery Navigate +3

GEO-Bench: Toward Foundation Models for Earth Monitoring

1 code implementation NeurIPS 2023 Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, Xiao Xiang Zhu

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks.

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

1 code implementation12 Feb 2020 Giulia Zarpellon, Jason Jo, Andrea Lodi, Yoshua Bengio

We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization.

Imitation Learning

Equilibrated adaptive learning rates for non-convex optimization

2 code implementations NeurIPS 2015 Yann N. Dauphin, Harm de Vries, Yoshua Bengio

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.

Bayesian Model-Agnostic Meta-Learning

2 code implementations NeurIPS 2018 Taesup Kim, Jaesik Yoon, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn

Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem.

Active Learning Image Classification +2

Learning Independent Features with Adversarial Nets for Non-linear ICA

1 code implementation ICLR 2018 Philemon Brakel, Yoshua Bengio

We propose to learn independent features with adversarial objectives which optimize such measures implicitly.

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

1 code implementation15 May 2021 Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang

Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.

Bilevel Optimization

Reweighted Wake-Sleep

2 code implementations11 Jun 2014 Jörg Bornschein, Yoshua Bengio

The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.

Combining Modular Skills in Multitask Learning

1 code implementation28 Feb 2022 Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy

By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.

Instruction Following reinforcement-learning +1

Unitary Evolution Recurrent Neural Networks

2 code implementations20 Nov 2015 Martin Arjovsky, Amar Shah, Yoshua Bengio

When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.

Sequential Image Classification

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

1 code implementation NeurIPS 2023 Lazar Atanackovic, Alexander Tong, Bo wang, Leo J. Lee, Yoshua Bengio, Jason Hartford

In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.

Bayesian Inference Causal Discovery

Learning Neural Generative Dynamics for Molecular Conformation Generation

3 code implementations ICLR 2021 Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang

Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.

valid

Hybrid Models for Learning to Branch

1 code implementation NeurIPS 2020 Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

1 code implementation26 May 2023 Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan

In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space.

Combinatorial Optimization

Bidirectional Helmholtz Machines

1 code implementation12 Jun 2015 Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio

We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

2 code implementations4 Oct 2023 Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics.

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation8 Jun 2017 Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

Diet Networks: Thin Parameters for Fat Genomics

5 code implementations28 Nov 2016 Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio

It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).

Parameter Prediction

Difference Target Propagation

1 code implementation23 Dec 2014 Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.

On Adversarial Mixup Resynthesis

1 code implementation NeurIPS 2019 Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal

In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders.

Resynthesis

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

1 code implementation3 Sep 2019 Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio

We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50, 000 stable crystal unit cells that vary from containing 1 to over 100 atoms.

Decoder Drug Discovery +1

Improving speech recognition by revising gated recurrent units

1 code implementation29 Sep 2017 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.

speech-recognition Speech Recognition

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

1 code implementation21 Oct 2019 Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.

Clustering Representation Learning

MAgNet: Mesh Agnostic Neural PDE Solver

1 code implementation11 Oct 2022 Oussama Boussif, Dan Assouline, Loubna Benabbou, Yoshua Bengio

The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases.

Zero-shot Generalization

Is a Modular Architecture Enough?

1 code implementation6 Jun 2022 Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures.

Out-of-Distribution Generalization

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

2 code implementations15 Aug 2022 Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng

To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks.

Ethics Multi-agent Reinforcement Learning

GFlowNet-EM for learning compositional latent variable models

1 code implementation13 Feb 2023 Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents.

Variational Inference

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

1 code implementation ICCV 2021 Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio

To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.

object-detection Robust Object Detection

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation NeurIPS 2017 Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

1 code implementation ICML 2020 Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.

Language Modelling Open-Ended Question Answering +2

The Causal-Neural Connection: Expressiveness, Learnability, and Inference

2 code implementations NeurIPS 2021 Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim

Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM.

Causal Identification Causal Inference +1

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

1 code implementation27 Dec 2022 Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.

Data Augmentation

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

1 code implementation9 Feb 2024 Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science.

Denoising Efficient Exploration

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

1 code implementation5 Jun 2014 Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Density Estimation Image Generation +1

Iterative Neural Autoregressive Distribution Estimator NADE-k

1 code implementation NeurIPS 2014 Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Density Estimation Image Generation +1

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

1 code implementation NeurIPS 2019 Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary.

Cannot find the paper you are looking for? You can Submit a new open access paper.