Search Results for author: Yoshua Bengio

Found 576 papers, 295 papers with code

Practical recommendations for gradient-based training of deep architectures

14 code implementations • 24 Jun 2012 • Yoshua Bengio

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.

76,633

Paper
Code

Graph Attention Networks

90 code implementations • ICLR 2018 • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

Ranked #1 on Node Classification on Pubmed (Validation metric)

Document Classification Graph Attention +8

48,779

Paper
Code

Generative Adversarial Networks

183 code implementations • Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels

48,779

Paper
Code

Hyperbolic Discounting and Learning over Multiple Horizons

1 code implementation • ICLR 2020 • William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process.

reinforcement-learning Reinforcement Learning (RL)

32,943

Paper
Code

On Catastrophic Interference in Atari 2600 Games

1 code implementation • 28 Feb 2020 • William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

Atari Games reinforcement-learning +1

32,943

Paper
Code

Revisiting Fundamentals of Experience Replay

2 code implementations • ICML 2020 • William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning +1

32,943

Paper
Code

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

21 code implementations • NeurIPS 2019 • Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville

In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.

Speech Synthesis Translation

29,694

Paper
Code

Boundary-Seeking Generative Adversarial Networks

6 code implementations • 27 Feb 2017 • R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.

Scene Understanding Text Generation

15,758

Paper
Code

Neural Machine Translation by Jointly Learning to Align and Translate

121 code implementations • 1 Sep 2014 • Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Neural machine translation is a recently proposed approach to machine translation.

Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)

Bangla Spelling Error Correction Decoder +4

13,735

Paper
Code

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

43 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).

Ranked #47 on Machine Translation on WMT2014 English-French

Decoder Machine Translation +1

13,735

Paper
Code

Deep Graph Infomax

11 code implementations • ICLR 2019 • Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R. Devon Hjelm

We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner.

Ranked #49 on Node Classification on Citeseer

General Classification Node Classification

13,042

Paper
Code

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

9,855

Paper
Code

A Structured Self-attentive Sentence Embedding

52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

General Classification Natural Language Inference +5

8,485

Paper
Code

SpeechBrain: A General-Purpose Speech Toolkit

4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio

SpeechBrain is an open-source and all-in-one speech toolkit.

Language Identification Spoken Language Understanding

7,934

Paper
Code

Compositional Attention: Disentangling Search and Retrieval

3 code implementations • ICLR 2022 • Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

7,668

Paper
Code

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

19 code implementations • ICLR 2020 • Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

We focus on solving the univariate times series point forecasting problem using deep learning.

Ranked #3 on Time-Series Few-Shot Learning with Heterogeneous Channels on TimeHetNet

Time Series Time-Series Few-Shot Learning with Heterogeneous Channels +1

7,334

Paper
Code

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

5 code implementations • NeurIPS 2015 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.

Ranked #30 on Image Classification on SVHN

6,298

Paper
Code

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

26 code implementations • 9 Feb 2016 • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.

6,298

Paper
Code

Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning

6 code implementations • 24 May 2020 • Joseph Paul Cohen, Lan Dao, Paul Morrison, Karsten Roth, Yoshua Bengio, Beiyi Shen, Almas Abbasi, Mahsa Hoshmand-Kochi, Marzyeh Ghassemi, Haifang Li, Tim Q Duong

In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images.

Management

2,977

Paper
Code

Pylearn2: a machine learning research library

6 code implementations • 20 Aug 2013 • Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, Yoshua Bengio

Pylearn2 is a machine learning research library.

BIG-bench Machine Learning Philosophy

2,755

Paper
Code

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

88 code implementations • 10 Feb 2015 • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

2,664

Paper
Code

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

2 code implementations • 30 May 2022 • Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors.

Decision Making Inductive Bias

2,651

Paper
Code

Unsupervised State Representation Learning in Atari

6 code implementations • NeurIPS 2019 • Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R. Devon Hjelm

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks.

Atari Games Representation Learning

2,602

Paper
Code

Benchmarking Graph Neural Networks

17 code implementations • 2 Mar 2020 • Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson

In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.

Ranked #1 on Link Prediction on COLLAB

Benchmarking Graph Classification +3

2,433

Paper
Code

Quaternion Recurrent Neural Networks

3 code implementations • ICLR 2019 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

2,353

Paper
Code

Twin Regularization for online speech recognition

2 code implementations • 15 Apr 2018 • Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Online speech recognition is crucial for developing natural human-machine interfaces.

speech-recognition Speech Recognition

2,353

Paper
Code

Twin Networks: Matching the Future for Sequence Generation

2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio

We propose a simple technique for encouraging generative RNNs to plan ahead.

Caption Generation speech-recognition +1

2,353

Paper
Code

The PyTorch-Kaldi Speech Recognition Toolkit

11 code implementations • 19 Nov 2018 • Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio

Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

Ranked #1 on Distant Speech Recognition on DIRHA English WSJ

Distant Speech Recognition Noisy Speech Recognition

2,353

Paper
Code

Interpretable Convolutional Filters with SincNet

1 code implementation • 23 Nov 2018 • Mirco Ravanelli, Yoshua Bengio

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.

Ranked #3 on Distant Speech Recognition on DIRHA English WSJ

Distant Speech Recognition Inductive Bias +1

2,353

Paper
Code

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

4 code implementations • ICLR 2018 • Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J. Pal

In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.

Ranked #1 on Semantic Textual Similarity on SentEval

Multi-Task Learning Natural Language Inference +2

2,279

Paper
Code

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

6 code implementations • ICLR 2019 • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.

Grounded language learning

2,016

Paper
Code

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

1 code implementation • ICLR 2020 • Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent.

reinforcement-learning Reinforcement Learning (RL) +1

2,015

Paper
Code

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

5 code implementations • 22 Sep 2016 • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.

1,981

Paper
Code

Torchmeta: A Meta-Learning library for PyTorch

5 code implementations • 14 Sep 2019 • Tristan Deleu, Tobias Würfl, Mandana Samiei, Joseph Paul Cohen, Yoshua Bengio

The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research.

Meta-Learning

1,936

Paper
Code

Gradient based sample selection for online continual learning

4 code implementations • NeurIPS 2019 • Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio

To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal.

Class Incremental Learning

1,682

Paper
Code

On Using Very Large Target Vocabulary for Neural Machine Translation

1 code implementation • IJCNLP 2015 • Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.

Machine Translation Translation

1,456

Paper
Code

Gradient Starvation: A Learning Proclivity in Neural Networks

2 code implementations • NeurIPS 2021 • Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Out-of-Distribution Generalization

1,335

Paper
Code

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

2 code implementations • NeurIPS 2021 • Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish

To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD.

Out-of-Distribution Generalization regression

1,335

Paper
Code

FitNets: Hints for Thin Deep Nets

3 code implementations • 19 Dec 2014 • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Knowledge Distillation

1,281

Paper
Code

Attention-Based Models for Speech Recognition

14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Ranked #17 on Speech Recognition on TIMIT

Machine Translation Speech Recognition +1

1,159

Paper
Code

Blocks and Fuel: Frameworks for deep learning

5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.

BIG-bench Machine Learning

1,159

Paper
Code

Speaker Recognition from Raw Waveform with SincNet

26 code implementations • 29 Jul 2018 • Mirco Ravanelli, Yoshua Bengio

Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.

Speaker Identification Speaker Recognition +1

1,100

Paper
Code

Speech and Speaker Recognition from Raw Waveform with SincNet

2 code implementations • 13 Dec 2018 • Mirco Ravanelli, Yoshua Bengio

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.

Inductive Bias Speaker Recognition +2

1,100

Paper
Code

Hyena Hierarchy: Towards Larger Convolutional Language Models

5 code implementations • 21 Feb 2023 • Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.

Ranked #37 on Language Modelling on WikiText-103

2k 8k +2

842

Paper
Code

Learning deep representations by mutual information estimation and maximization

9 code implementations • ICLR 2019 • R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio

In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder.

General Classification Mutual Information Estimation +1

792

Paper
Code

Improving and generalizing flow-based generative models with minibatch optimal transport

2 code implementations • 1 Feb 2023 • Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, Yoshua Bengio

CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models.

760

Paper
Code

Simulation-free Schrödinger bridges via score and flow matching

1 code implementation • 7 Jul 2023 • Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio

We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired samples drawn from arbitrary source and target distributions.

760

Paper
Code

Deep Complex Networks

9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Ranked #3 on Music Transcription on MusicNet

Image Classification Music Transcription +1

704

Paper
Code

Avoidance Learning Using Observational Reinforcement Learning

1 code implementation • 24 Sep 2019 • David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup

We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator.

Imitation Learning reinforcement-learning +1

672

Paper
Code

BabyAI 1.1

3 code implementations • 24 Jul 2020 • David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.

Computational Efficiency Imitation Learning +2

669

Paper
Code

An Actor-Critic Algorithm for Sequence Prediction

3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).

Ranked #8 on Machine Translation on IWSLT2015 English-German

Caption Generation Machine Translation +3

659

Paper
Code

NICE: Non-linear Independent Components Estimation

19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

612

Paper
Code

Maxout Networks

7 code implementations • 18 Feb 2013 • Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.

Ranked #35 on Image Classification on MNIST

General Classification Image Classification

587

Paper
Code

Manifold Mixup: Better Representations by Interpolating Hidden States

12 code implementations • ICLR 2019 • Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples.

Ranked #18 on Image Classification on OmniBenchmark

Image Classification

576

Paper
Code

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

4 code implementations • NeurIPS 2021 • Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio

Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.

566

Paper
Code

Trajectory balance: Improved credit assignment in GFlowNets

3 code implementations • 31 Jan 2022 • Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object.

566

Paper
Code

Learning GFlowNets from partial episodes for improved convergence and stability

3 code implementations • 26 Sep 2022 • Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, Emmanuel Bengio, Moksh Jain, Andrei Nica, Tom Bosc, Yoshua Bengio, Nikolay Malkin

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks.

566

Paper
Code

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

1 code implementation • CVPR 2017 • Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.

Image Captioning Image Inpainting

539

Paper
Code

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

4 code implementations • 22 Dec 2016 • Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Audio Generation Temporal Sequences

533

Paper
Code

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

2 code implementations • NeurIPS 2023 • Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré

Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models.

4k In-Context Learning +2

491

Paper
Code

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

22 code implementations • 28 Nov 2016 • Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs).

Ranked #9 on Semantic Segmentation on CamVid

Image Segmentation Segmentation +1

488

Paper
Code

Combined Reinforcement Learning via Abstract Representations

1 code implementation • 12 Sep 2018 • Vincent François-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages.

reinforcement-learning Reinforcement Learning (RL) +1

485

Paper
Code

Noisy Activation Functions

1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

479

Paper
Code

Manifold Mixup: Learning Better Representations by Interpolating Hidden States

1 code implementation • ICLR 2019 • Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Aaron Courville, Ioannis Mitliagkas, Yoshua Bengio

Because the hidden states are learned, this has an important effect of encouraging the hidden states for a class to be concentrated in such a way so that interpolations within the same class or between two different classes do not intersect with the real data points from other classes.

477

Paper
Code

Iterative Alternating Neural Attention for Machine Reading

1 code implementation • 7 Jun 2016 • Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.

Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)

Question Answering Reading Comprehension

438

Paper
Code

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Ranked #2 on Distant Speech Recognition on DIRHA English WSJ

Distant Speech Recognition

436

Paper
Code

Multi-task self-supervised learning for Robust Speech Recognition

1 code implementation • 25 Jan 2020 • Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.

Robust Speech Recognition Self-Supervised Learning +1

436

Paper
Code

Challenges in Representation Learning: A report on three machine learning contests

11 code implementations • 1 Jul 2013 • Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko, Bing Xu, Zhang Chuang, Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge.

Ranked #12 on Facial Expression Recognition (FER) on FER2013

BIG-bench Machine Learning Facial Expression Recognition +2

421

Paper
Code

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

1 code implementation • EMNLP 2018 • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers.

Ranked #34 on Question Answering on HotpotQA

Multi-hop Question Answering Question Answering +1

401

Paper
Code

GMNN: Graph Markov Neural Networks

1 code implementation • 15 May 2019 • Meng Qu, Yoshua Bengio, Jian Tang

Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training.

Classification General Classification +3

393

Paper
Code

Neural Networks with Few Multiplications

2 code implementations • 11 Oct 2015 • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio

For most deep learning algorithms training is notoriously time consuming.

General Classification

374

Paper
Code

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

13 code implementations • 11 Dec 2014 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).

Ranked #10 on Music Modeling on JSB Chorales

Music Modeling

371

Paper
Code

Generative Adversarial Nets

1 code implementation • NeurIPS 2014 • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.

334

Paper
Code

Deep Generative Stochastic Networks Trainable by Backprop

3 code implementations • 5 Jun 2013 • Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.

318

Paper
Code

Variance Reduction in SGD by Distributed Importance Sampling

1 code implementation • 20 Nov 2015 • Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio

This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.

316

Paper
Code

MINE: Mutual Information Neural Estimation

21 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

315

Paper
Code

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

We propose zoneout, a novel method for regularizing RNNs.

Language Modelling

311

Paper
Code

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

Decoder Response Generation

308

Paper
Code

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

4 code implementations • 2 Jun 2016 • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.

Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)

Dialogue Generation Response Generation

308

Paper
Code

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

7 code implementations • 17 Jul 2015 • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.

Decoder Word Embeddings

308

Paper
Code

Brain Tumor Segmentation with Deep Neural Networks

15 code implementations • 13 May 2015 • Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, Hugo Larochelle

Finally, we explore a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN.

Ranked #1 on Brain Tumor Segmentation on BRATS-2013 leaderboard

Brain Tumor Segmentation Tumor Segmentation

299

Paper
Code

A Recurrent Latent Variable Model for Sequential Data

5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

288

Paper
Code

HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion

1 code implementation • ICLR 2020 • Michel Deudon, Alfredo Kalaitzis, Md Rifat Arefin, Israel Goytom, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio

Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views.

Ranked #6 on Multi-Frame Super-Resolution on PROBA-V

De-aliasing Earth Observation +2

272

Paper
Code

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery

2 code implementations • 15 Feb 2020 • Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Md Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio

Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views.

Ranked #6 on Multi-Frame Super-Resolution on PROBA-V

De-aliasing Earth Observation +2

272

Paper
Code

Oracle performance for visual captioning

1 code implementation • 14 Nov 2015 • Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

The task of associating images and videos with a natural language description has attracted a great amount of attention recently.

Image Captioning Language Modelling +1

260

Paper
Code

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

260

Paper
Code

Task Loss Estimation for Sequence Prediction

1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Decoder Language Modelling +2

260

Paper
Code

Artificial Neural Networks Applied to Taxi Destination Prediction

1 code implementation • 31 Jul 2015 • Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio

We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.

260

Paper
Code

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

246

Paper
Code

Graph Neural Networks with Learnable Structural and Positional Representations

1 code implementation • ICLR 2022 • Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson

An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers.

Ranked #12 on Graph Regression on ZINC-500k

Graph Regression Knowledge Graphs +1

223

Paper
Code

Speech Model Pre-training for End-to-End Spoken Language Understanding

1 code implementation • 7 Apr 2019 • Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.

Ranked #15 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

220

Paper
Code

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

1 code implementation • ICLR 2021 • Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, Stefan Bauer

To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment.

Reinforcement Learning (RL) Transfer Learning

201

Paper
Code

Meta-learning framework with applications to zero-shot time-series forecasting

3 code implementations • 7 Feb 2020 • Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets?

Meta-Learning Time Series +1

195

Paper
Code

GFlowNet Foundations

2 code implementations • 17 Nov 2021 • Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.

Active Learning

187

Paper
Code

torchgfn: A PyTorch GFlowNet library

2 code implementations • 24 May 2023 • Salem Lahlou, Joseph D. Viviano, Victor Schmidt, Yoshua Bengio

The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments.

187

Paper
Code

Chunked Autoregressive GAN for Conditional Waveform Synthesis

1 code implementation • ICLR 2022 • Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.

Inductive Bias

180

Paper
Code

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

4 code implementations • 8 Jul 2015 • Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

Decoder

175

Paper
Code

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

1 code implementation • NeurIPS 2017 • Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into sequence-to-sequence models using attention.

Question Generation Question-Generation +2

168

Paper
Code

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation • 13 Jun 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Decoder Machine Translation +1

168

Paper
Code

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

2 code implementations • ACL 2016 • Junyoung Chung, Kyunghyun Cho, Yoshua Bengio

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.

Ranked #3 on Machine Translation on WMT2015 English-German

Decoder Machine Translation +2

168

Paper
Code

An Empirical Study of Example Forgetting during Deep Neural Network Learning

3 code implementations • ICLR 2019 • Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.

General Classification

165

Paper
Code

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

2 code implementations • 3 Sep 2014 • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.

Decoder Machine Translation +2

150

Paper
Code

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy

3 code implementations • 16 Jun 2019 • Alex Lamb, Vikas Verma, Kenji Kawaguchi, Alexander Matyasko, Savya Khosla, Juho Kannala, Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in the theory and the practice.

Adversarial Robustness

148

Paper
Code

Maximum Entropy Generators for Energy-Based Models

2 code implementations • 24 Jan 2019 • Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio

Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.

Anomaly Detection

145

Paper
Code

Interpolation Consistency Training for Semi-Supervised Learning

4 code implementations • 9 Mar 2019 • Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.

Ranked #2 on Semi-Supervised Image Classification on CIFAR-10, 2000 Labels

General Classification Semi-Supervised Image Classification

141

Paper
Code

Hierarchical Multiscale Recurrent Neural Networks

3 code implementations • 6 Sep 2016 • Junyoung Chung, Sungjin Ahn, Yoshua Bengio

Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.

Ranked #19 on Language Modelling on Text8

Language Modelling

136

Paper
Code

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville

Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.

Ranked #18 on Semantic Segmentation on CamVid

Segmentation Semantic Segmentation +1

124

Paper
Code

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks

4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio

In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.

Ranked #35 on Image Classification on MNIST

Image Classification Object Recognition

124

Paper
Code

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

2 code implementations • ICLR 2020 • Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal

We show that causal structures can be parameterized via continuous variables and learned end-to-end.

Meta-Learning

123

Paper
Code

Multi-Fidelity Active Learning with GFlowNets

2 code implementations • 20 Jun 2023 • Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio

For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive.

Active Learning

122

Paper
Code

Crystal-GFN: sampling crystals with desirable properties and constraints

1 code implementation • 7 Oct 2023 • Mila AI4Science, Alex Hernandez-Garcia, Alexandre Duval, Alexandra Volokhova, Yoshua Bengio, Divya Sharma, Pierre Luc Carrier, Yasmine Benabed, Michał Koziarski, Victor Schmidt

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis.

Formation Energy

122

Paper
Code

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

2 code implementations • ICLR 2021 • Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Bengio, Jian Tang

Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step.

Knowledge Graphs

118

Paper
Code

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

2 code implementations • 9 Oct 2014 • Stephan Gouws, Yoshua Bengio, Greg Corrado

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Ranked #1 on Document Classification on Reuters En-De

Cross-Lingual Document Classification Document Classification +3

116

Paper
Code

Recurrent Independent Mechanisms

3 code implementations • ICLR 2021 • Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.

116

Paper
Code

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

111

Paper
Code

Equivalence of Equilibrium Propagation and Recurrent Backpropagation

1 code implementation • 22 Nov 2017 • Benjamin Scellier, Yoshua Bengio

Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.

108

Paper
Code

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation

2 code implementations • 16 Feb 2016 • Benjamin Scellier, Yoshua Bengio

Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.

108

Paper
Code

Generalization of Equilibrium Propagation to Vector Field Dynamics

3 code implementations • 14 Aug 2018 • Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio

The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists.

108

Paper
Code

Learning to Understand Phrases by Embedding the Dictionary

2 code implementations • TACL 2016 • Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio

Distributional models that learn rich semantic word representations are a success story of recent NLP research.

General Knowledge

107

Paper
Code

Representation Learning: A Review and New Perspectives

5 code implementations • 24 Jun 2012 • Yoshua Bengio, Aaron Courville, Pascal Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.

Density Estimation Representation Learning

106

Paper
Code

Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization

1 code implementation • ICCV 2019 • Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang

Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization.

domain classification Image-to-Image Translation +1

Paper
Code

Learning Neural Causal Models from Unknown Interventions

2 code implementations • 2 Oct 2019 • Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.

Meta-Learning

Paper
Code

Learning Neural Causal Models with Active Interventions

1 code implementation • 6 Sep 2021 • Nino Scherrer, Olexa Bilaniuk, Yashas Annadani, Anirudh Goyal, Patrick Schwab, Bernhard Schölkopf, Michael C. Mozer, Yoshua Bengio, Stefan Bauer, Nan Rosemary Ke

Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science.

Causal Discovery

Paper
Code

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

2 code implementations • 11 Mar 2024 • Minsu Kim, Sanghyeok Choi, Jiwoo Son, Hyeonah Kim, Jinkyoo Park, Yoshua Bengio

This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a novel neural-guided meta-heuristic algorithm for combinatorial optimization.

Combinatorial Optimization

Paper
Code

On the Spectral Bias of Neural Networks

2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

Paper
Code

Image-to-image translation for cross-domain disentanglement

1 code implementation • NeurIPS 2018 • Abel Gonzalez-Garcia, Joost Van de Weijer, Yoshua Bengio

We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.

Disentanglement Image-to-Image Translation +2

Paper
Code

Amortizing intractable inference in large language models

1 code implementation • 6 Oct 2023 • Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions.

Bayesian Inference

Paper
Code

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

3 code implementations • ICCV 2019 • Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor

Conditional text-to-image generation is an active area of research, with many possible applications.

Ranked #2 on Text-to-Image Generation on GeNeVA (i-CLEVR)

Text-to-Image Generation

Paper
Code

Towards Gene Expression Convolutions using Gene Interaction Graphs

1 code implementation • 18 Jun 2018 • Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio

We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used.

Paper
Code

Generative Flow Networks for Discrete Probabilistic Modeling

2 code implementations • 3 Feb 2022 • Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data.

Paper
Code

Bayesian Structure Learning with Generative Flow Networks

1 code implementation • 28 Feb 2022 • Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data.

Variational Inference

Paper
Code

A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

1 code implementation • 12 Dec 2023 • Alexandre Duval, Simon V. Mathis, Chaitanya K. Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D. Malliaros, Taco Cohen, Pietro Liò, Yoshua Bengio, Michael Bronstein

In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations.

Protein Structure Prediction Specificity

Paper
Code

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

1 code implementation • 25 Sep 2019 • Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.

Ranked #1 on Node Classification on Pubmed random partition

Generalization Bounds Graph Attention +1

Paper
Code

ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

2 code implementations • ICLR 2022 • Victor Schmidt, Alexandra Sasha Luccioni, Mélisande Teng, Tianyu Zhang, Alexia Reynaud, Sunand Raghupathi, Gautier Cosne, Adrien Juraver, Vahe Vardanyan, Alex Hernandez-Garcia, Yoshua Bengio

Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour.

Conditional Image Generation Unsupervised Domain Adaptation

Paper
Code

Biological Sequence Design with GFlowNets

1 code implementation • 2 Mar 2022 • Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio

In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round.

Active Learning

Paper
Code

Fraternal Dropout

1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Ranked #28 on Language Modelling on Penn Treebank (Word Level)

Image Captioning Language Modelling

Paper
Code

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

1 code implementation • 21 Dec 2013 • Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio

Catastrophic forgetting is a problem faced by many machine learning models and algorithms.

BIG-bench Machine Learning

Paper
Code

Training deep neural networks with low precision multiplications

1 code implementation • 22 Dec 2014 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.

Paper
Code

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

1 code implementation • 26 Apr 2020 • Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Yashaswi Pathak, Hao-Ran Wei, Shengchao Liu, Karam M. J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models.

Drug Discovery Navigate +2

Paper
Code

Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning

1 code implementation • ICML 2020 • Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Hao-Ran Wei, Yashaswi Pathak, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

In this work, we propose a novel reinforcement learning (RL) setup for drug discovery that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo compound design system.

Drug Discovery Navigate +3

Paper
Code

GEO-Bench: Toward Foundation Models for Earth Monitoring

1 code implementation • NeurIPS 2023 • Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, Xiao Xiang Zhu

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks.

Paper
Code

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

In this work, we propose a novel constituency parsing scheme.

Constituency Parsing Position +1

Paper
Code

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

1 code implementation • 12 Feb 2020 • Giulia Zarpellon, Jason Jo, Andrea Lodi, Yoshua Bengio

We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization.

Imitation Learning

Paper
Code

Equilibrated adaptive learning rates for non-convex optimization

2 code implementations • NeurIPS 2015 • Yann N. Dauphin, Harm de Vries, Yoshua Bengio

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.

Paper
Code

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

1 code implementation • NeurIPS 2021 • Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning.

Model-based Reinforcement Learning Out-of-Distribution Generalization +2

Paper
Code

Bayesian Model-Agnostic Meta-Learning

2 code implementations • NeurIPS 2018 • Taesup Kim, Jaesik Yoon, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn

Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem.

Active Learning Image Classification +2

Paper
Code

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

1 code implementation • ACL 2017 • Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.

Dialogue Evaluation

Paper
Code

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).

Paper
Code

Count-ception: Counting by Fully Convolutional Redundant Counting

2 code implementations • 25 Mar 2017 • Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio

Our contribution is redundant counting instead of predicting a density map in order to average over errors.

Object Localization regression

Paper
Code

Improving day-ahead Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context

1 code implementation • 1 Jun 2023 • Oussama Boussif, Ghait Boukachab, Dan Assouline, Stefano Massaroli, Tianle Yuan, Loubna Benabbou, Yoshua Bengio

Solar power harbors immense potential in mitigating climate change by substantially reducing CO$_{2}$ emissions.

Solar Irradiance Forecasting Time Series +2

Paper
Code

Learning Independent Features with Adversarial Nets for Non-linear ICA

1 code implementation • ICLR 2018 • Philemon Brakel, Yoshua Bengio

We propose to learn independent features with adversarial objectives which optimize such measures implicitly.

Paper
Code

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

1 code implementation • 15 May 2021 • Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang

Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.

Bilevel Optimization

Paper
Code

Reweighted Wake-Sleep

2 code implementations • 11 Jun 2014 • Jörg Bornschein, Yoshua Bengio

The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.

Paper
Code

Combining Modular Skills in Multitask Learning

1 code implementation • 28 Feb 2022 • Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy

By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.

Instruction Following reinforcement-learning +1

Paper
Code

Unitary Evolution Recurrent Neural Networks

2 code implementations • 20 Nov 2015 • Martin Arjovsky, Amar Shah, Yoshua Bengio

When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.

Ranked #26 on Sequential Image Classification on Sequential MNIST

Sequential Image Classification

Paper
Code

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

1 code implementation • 2 Jul 2021 • Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure.

Benchmarking Causal Discovery +4

Paper
Code

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

1 code implementation • NeurIPS 2023 • Lazar Atanackovic, Alexander Tong, Bo wang, Leo J. Lee, Yoshua Bengio, Jason Hartford

In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.

Bayesian Inference Causal Discovery

Paper
Code

Learning Neural Generative Dynamics for Molecular Conformation Generation

3 code implementations • ICLR 2021 • Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang

Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.

valid

Paper
Code

Interactive Language Learning by Question Answering

1 code implementation • IJCNLP 2019 • Xingdi Yuan, Marc-Alexandre Cote, Jie Fu, Zhouhan Lin, Christopher Pal, Yoshua Bengio, Adam Trischler

In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions.

Machine Reading Comprehension Question Answering

Paper
Code

Hybrid Models for Learning to Branch

1 code implementation • NeurIPS 2020 • Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?

Paper
Code

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

1 code implementation • 26 May 2023 • Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan

In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space.

Combinatorial Optimization

Paper
Code

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

1 code implementation • ACL 2016 • Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances.

Machine Translation Question Generation +4

Paper
Code

Bidirectional Helmholtz Machines

1 code implementation • 12 Jun 2015 • Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio

We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.

Paper
Code

Tackling Climate Change with Machine Learning

3 code implementations • 10 Jun 2019 • David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help.

BIG-bench Machine Learning Management

Paper
Code

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

2 code implementations • 4 Oct 2023 • Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics.

Paper
Code

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation • 8 Jun 2017 • Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

Paper
Code

FigureQA: An Annotated Figure Dataset for Visual Reasoning

1 code implementation • ICLR 2018 • Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio

To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure.

Ranked #3 on Visual Question Answering (VQA) on FigureQA - test 1

BIG-bench Machine Learning Chart Question Answering +2

Paper
Code

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

1 code implementation • ACL 2019 • Chinnadhurai Sankar, Sandeep Subramanian, Christopher Pal, Sarath Chandar, Yoshua Bengio

Neural generative models have been become increasingly popular when building conversational agents.

Paper
Code

HNHN: Hypergraph Networks with Hyperedge Neurons

1 code implementation • 22 Jun 2020 • Yihe Dong, Will Sawin, Yoshua Bengio

Hypergraphs provide a natural representation for many real world datasets.

Hypergraph representations Representation Learning

Paper
Code

Diet Networks: Thin Parameters for Fat Genomics

5 code implementations • 28 Nov 2016 • Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio

It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).

Parameter Prediction

Paper
Code

Z-Forcing: Training Stochastic Recurrent Networks

1 code implementation • NeurIPS 2017 • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio

Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech.

Language Modelling Variational Inference

Paper
Code

A Closer Look at Memorization in Deep Networks

2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.

Adversarial Robustness Memorization

Paper
Code

Difference Target Propagation

1 code implementation • 23 Dec 2014 • Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.

Paper
Code

On Adversarial Mixup Resynthesis

1 code implementation • NeurIPS 2019 • Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal

In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders.

Resynthesis

Paper
Code

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

1 code implementation • 3 Sep 2019 • Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio

We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50, 000 stable crystal unit cells that vary from containing 1 to over 100 atoms.

Decoder Drug Discovery +1

Paper
Code

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

2 code implementations • ECCV 2020 • Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Björn Ommer, Joseph Paul Cohen

Visual Similarity plays an important role in many computer vision applications.

Ranked #13 on Metric Learning on CUB-200-2011 (using extra training data)

Metric Learning

Paper
Code

Light Gated Recurrent Units for Speech Recognition

1 code implementation • 26 Mar 2018 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).

Ranked #6 on Speech Recognition on TIMIT

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Improving speech recognition by revising gated recurrent units

1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.

speech-recognition Speech Recognition

Paper
Code

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

1 code implementation • 21 Oct 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.

Clustering Representation Learning

Paper
Code

MAgNet: Mesh Agnostic Neural PDE Solver

1 code implementation • 11 Oct 2022 • Oussama Boussif, Dan Assouline, Loubna Benabbou, Yoshua Bengio

The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases.

Zero-shot Generalization

Paper
Code

Is a Modular Architecture Enough?

1 code implementation • 6 Jun 2022 • Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures.

Out-of-Distribution Generalization

Paper
Code

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

2 code implementations • 15 Aug 2022 • Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng

To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks.

Ethics Multi-agent Reinforcement Learning

Paper
Code

GFlowNet-EM for learning compositional latent variable models

1 code implementation • 13 Feb 2023 • Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents.

Variational Inference

Paper
Code

Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya

1 code implementation • 9 Dec 2020 • Shimaa Baraka, Benjamin Akera, Bibek Aryal, Tenzing Sherpa, Finu Shresta, Anthony Ortiz, Kris Sankaran, Juan Lavista Ferres, Mir Matin, Yoshua Bengio

Glacier mapping is key to ecological monitoring in the hkh region.

BIG-bench Machine Learning

Paper
Code

Multi-Image Super-Resolution for Remote Sensing using Deep Recurrent Networks

1 code implementation • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020 • Md Rifat Arefin, Vincent Michalski, Pierre-Luc St-Charles, Alfredo Kalaitzis, Sookyung Kim, Samira E. Kahou, Yoshua Bengio

High-resolution satellite imagery is critical for various earth observation applications related to environment monitoring, geoscience, forecasting, and land use analysis.

Decoder Earth Observation +1

Paper
Code

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

1 code implementation • ICCV 2021 • Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio

To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.

object-detection Robust Object Detection

Paper
Code

Towards end-to-end spoken language understanding

1 code implementation • 23 Feb 2018 • Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, Baiyang Liu, Yoshua Bengio

Spoken language understanding system is traditionally designed as a pipeline of a number of components.

Natural Language Understanding Spoken Language Understanding

Paper
Code

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Paper
Code

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Code

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

1 code implementation • ICML 2020 • Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.

Language Modelling Open-Ended Question Answering +2

Paper
Code

DEUP: Direct Epistemic Uncertainty Prediction

1 code implementation • 16 Feb 2021 • Salem Lahlou, Moksh Jain, Hadi Nekoei, Victor Ion Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio

Epistemic Uncertainty is a measure of the lack of knowledge of a learner which diminishes with more evidence.

Active Learning Image Classification +2

Paper
Code

The Causal-Neural Connection: Expressiveness, Learnability, and Inference

2 code implementations • NeurIPS 2021 • Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim

Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM.

Causal Identification Causal Inference +1

Paper
Code

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

1 code implementation • 27 Dec 2022 • Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.

Data Augmentation

Paper
Code

Predictive Inference with Feature Conformal Prediction

1 code implementation • 1 Oct 2022 • Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, Yang Yuan

Conformal prediction is a distribution-free technique for establishing valid prediction intervals.

Conformal Prediction Image Segmentation +5

Paper
Code

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

1 code implementation • 9 Feb 2024 • Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science.

Denoising Efficient Exploration

Paper
Code

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

1 code implementation • 5 Jun 2014 • Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Ranked #7 on Image Generation on Binarized MNIST

Density Estimation Image Generation +1

Paper
Code

Iterative Neural Autoregressive Distribution Estimator NADE-k

1 code implementation • NeurIPS 2014 • Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.

Ranked #8 on Image Generation on Binarized MNIST

Density Estimation Image Generation +1

Paper
Code

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

1 code implementation • 7 Feb 2022 • Paul Bertin, Jarrid Rector-Brooks, Deepak Sharma, Thomas Gaudelet, Andrew Anighoro, Torsten Gross, Francisco Martinez-Pena, Eileen L. Tang, Suraj M S, Cristian Regep, Jeremy Hayter, Maksym Korablyov, Nicholas Valiante, Almer van der Sloot, Mike Tyers, Charles Roberts, Michael M. Bronstein, Luke L. Lairson, Jake P. Taylor-King, Yoshua Bengio

For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges.

Benchmarking Model Optimization

Paper
Code

Interventional Causal Representation Learning

1 code implementation • 24 Sep 2022 • Kartik Ahuja, Divyat Mahajan, Yixin Wang, Yoshua Bengio

Can interventional data facilitate causal representation learning?

Representation Learning

Paper
Code

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

1 code implementation • NeurIPS 2019 • Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.