14 code implementations • 24 Jun 2012 • Yoshua Bengio
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters.
90 code implementations • ICLR 2018 • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.
Ranked #1 on Node Classification on Pubmed (Validation metric)
183 code implementations • Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
Super-Resolution Time-Series Few-Shot Learning with Heterogeneous Channels
1 code implementation • ICLR 2020 • William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle
Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process.
1 code implementation • 28 Feb 2020 • William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle
Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.
2 code implementations • ICML 2020 • William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.
21 code implementations • NeurIPS 2019 • Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville
In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.
6 code implementations • 27 Feb 2017 • R. Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio
We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator.
121 code implementations • 1 Sep 2014 • Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
Neural machine translation is a recently proposed approach to machine translation.
Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)
43 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).
Ranked #47 on Machine Translation on WMT2014 English-French
11 code implementations • ICLR 2019 • Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R. Devon Hjelm
We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner.
Ranked #49 on Node Classification on Citeseer
1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang
Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.
52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
3 code implementations • ICLR 2022 • Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie
Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.
19 code implementations • ICLR 2020 • Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio
We focus on solving the univariate times series point forecasting problem using deep learning.
Time Series Time-Series Few-Shot Learning with Heterogeneous Channels +1
5 code implementations • NeurIPS 2015 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.
Ranked #30 on Image Classification on SVHN
26 code implementations • 9 Feb 2016 • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.
6 code implementations • 24 May 2020 • Joseph Paul Cohen, Lan Dao, Paul Morrison, Karsten Roth, Yoshua Bengio, Beiyi Shen, Almas Abbasi, Mahsa Hoshmand-Kochi, Marzyeh Ghassemi, Haifang Li, Tim Q Duong
In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images.
6 code implementations • 20 Aug 2013 • Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, Yoshua Bengio
Pylearn2 is a machine learning research library.
88 code implementations • 10 Feb 2015 • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio
Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.
2 code implementations • 30 May 2022 • Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio
A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors.
6 code implementations • NeurIPS 2019 • Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R. Devon Hjelm
State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks.
17 code implementations • 2 Mar 2020 • Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
Ranked #1 on Link Prediction on COLLAB
3 code implementations • ICLR 2019 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 15 Apr 2018 • Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio
Online speech recognition is crucial for developing natural human-machine interfaces.
2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio
We propose a simple technique for encouraging generative RNNs to plan ahead.
11 code implementations • 19 Nov 2018 • Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio
Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
Ranked #1 on Distant Speech Recognition on DIRHA English WSJ
1 code implementation • 23 Nov 2018 • Mirco Ravanelli, Yoshua Bengio
Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.
Ranked #3 on Distant Speech Recognition on DIRHA English WSJ
4 code implementations • ICLR 2018 • Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J. Pal
In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.
Ranked #1 on Semantic Textual Similarity on SentEval
6 code implementations • ICLR 2019 • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.
1 code implementation • ICLR 2020 • Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine
This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent.
5 code implementations • 22 Sep 2016 • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.
5 code implementations • 14 Sep 2019 • Tristan Deleu, Tobias Würfl, Mandana Samiei, Joseph Paul Cohen, Yoshua Bengio
The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research.
4 code implementations • NeurIPS 2019 • Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio
To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal.
1 code implementation • IJCNLP 2015 • Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models.
2 code implementations • NeurIPS 2021 • Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie
We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
2 code implementations • NeurIPS 2021 • Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish
To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD.
3 code implementations • 19 Dec 2014 • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio
In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.
14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.
Ranked #17 on Speech Recognition on TIMIT
5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.
26 code implementations • 29 Jul 2018 • Mirco Ravanelli, Yoshua Bengio
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
2 code implementations • 13 Dec 2018 • Mirco Ravanelli, Yoshua Bengio
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.
5 code implementations • 21 Feb 2023 • Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.
Ranked #37 on Language Modelling on WikiText-103
9 code implementations • ICLR 2019 • R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder.
2 code implementations • 1 Feb 2023 • Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, Yoshua Bengio
CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models.
1 code implementation • 7 Jul 2023 • Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio
We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired samples drawn from arbitrary source and target distributions.
9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal
Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.
Ranked #3 on Music Transcription on MusicNet
1 code implementation • 24 Sep 2019 • David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup
We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator.
3 code implementations • 24 Jul 2020 • David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio
This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.
3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).
Ranked #8 on Machine Translation on IWSLT2015 English-German
19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio
It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.
Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)
7 code implementations • 18 Feb 2013 • Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.
Ranked #35 on Image Classification on MNIST
12 code implementations • ICLR 2019 • Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples.
Ranked #18 on Image Classification on OmniBenchmark
4 code implementations • NeurIPS 2021 • Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio
Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.
3 code implementations • 31 Jan 2022 • Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio
Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object.
3 code implementations • 26 Sep 2022 • Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, Emmanuel Bengio, Moksh Jain, Andrei Nica, Tom Bosc, Yoshua Bengio, Nikolay Malkin
Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks.
1 code implementation • CVPR 2017 • Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.
4 code implementations • 22 Dec 2016 • Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
2 code implementations • NeurIPS 2023 • Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré
Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models.
22 code implementations • 28 Nov 2016 • Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio
State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs).
Ranked #9 on Semantic Segmentation on CamVid
1 code implementation • 12 Sep 2018 • Vincent François-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau
In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages.
1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio
Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).
1 code implementation • ICLR 2019 • Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Aaron Courville, Ioannis Mitliagkas, Yoshua Bengio
Because the hidden states are learned, this has an important effect of encouraging the hidden states for a class to be concentrated in such a way so that interpolations within the same class or between two different classes do not intersect with the real data points from other classes.
1 code implementation • 7 Jun 2016 • Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio
We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.
Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)
1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.
Ranked #2 on Distant Speech Recognition on DIRHA English WSJ
1 code implementation • 25 Jan 2020 • Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
11 code implementations • 1 Jul 2013 • Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko, Bing Xu, Zhang Chuang, Yoshua Bengio
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge.
Ranked #12 on Facial Expression Recognition (FER) on FER2013
1 code implementation • EMNLP 2018 • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers.
Ranked #34 on Question Answering on HotpotQA
1 code implementation • 15 May 2019 • Meng Qu, Yoshua Bengio, Jian Tang
Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training.
2 code implementations • 11 Oct 2015 • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
For most deep learning algorithms training is notoriously time consuming.
13 code implementations • 11 Dec 2014 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).
Ranked #10 on Music Modeling on JSB Chorales
1 code implementation • NeurIPS 2014 • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
3 code implementations • 5 Jun 2013 • Yoshua Bengio, Éric Thibodeau-Laufer, Guillaume Alain, Jason Yosinski
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.
1 code implementation • 20 Nov 2015 • Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio
This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient.
21 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
We propose zoneout, a novel method for regularizing RNNs.
9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio
Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.
4 code implementations • 2 Jun 2016 • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville
We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.
Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)
7 code implementations • 17 Jul 2015 • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.
15 code implementations • 13 May 2015 • Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, Hugo Larochelle
Finally, we explore a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN.
Ranked #1 on Brain Tumor Segmentation on BRATS-2013 leaderboard
5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio
In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.
1 code implementation • ICLR 2020 • Michel Deudon, Alfredo Kalaitzis, Md Rifat Arefin, Israel Goytom, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio
Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views.
Ranked #6 on Multi-Frame Super-Resolution on PROBA-V
2 code implementations • 15 Feb 2020 • Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Md Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio
Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views.
Ranked #6 on Multi-Frame Super-Resolution on PROBA-V
1 code implementation • 14 Nov 2015 • Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio
The task of associating images and videos with a natural language description has attracted a great amount of attention recently.
1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio
Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).
1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio
Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.
1 code implementation • 31 Jul 2015 • Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio
We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction.
4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio
Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).
1 code implementation • ICLR 2022 • Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson
An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers.
Ranked #12 on Graph Regression on ZINC-500k
1 code implementation • 7 Apr 2019 • Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio
Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.
Ranked #15 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
1 code implementation • ICLR 2021 • Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, Stefan Bauer
To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment.
3 code implementations • 7 Feb 2020 • Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio
Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets?
2 code implementations • 17 Nov 2021 • Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.
2 code implementations • 24 May 2023 • Salem Lahlou, Joseph D. Viviano, Victor Schmidt, Yoshua Bengio
The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments.
1 code implementation • ICLR 2022 • Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio
We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.
4 code implementations • 8 Jul 2015 • Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie
Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.
1 code implementation • NeurIPS 2017 • Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into sequence-to-sequence models using attention.
1 code implementation • 13 Jun 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.
2 code implementations • ACL 2016 • Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.
Ranked #3 on Machine Translation on WMT2015 English-German
3 code implementations • ICLR 2019 • Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.
2 code implementations • 3 Sep 2014 • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio
In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.
3 code implementations • 16 Jun 2019 • Alex Lamb, Vikas Verma, Kenji Kawaguchi, Alexander Matyasko, Savya Khosla, Juho Kannala, Yoshua Bengio
Adversarial robustness has become a central goal in deep learning, both in the theory and the practice.
2 code implementations • 24 Jan 2019 • Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.
4 code implementations • 9 Mar 2019 • Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.
3 code implementations • 6 Sep 2016 • Junyoung Chung, Sungjin Ahn, Yoshua Bengio
Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence.
Ranked #19 on Language Modelling on Text8
2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville
Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.
Ranked #18 on Semantic Segmentation on CamVid
4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio
In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.
Ranked #35 on Image Classification on MNIST
2 code implementations • ICLR 2020 • Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal
We show that causal structures can be parameterized via continuous variables and learned end-to-end.
2 code implementations • 20 Jun 2023 • Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio
For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive.
1 code implementation • 7 Oct 2023 • Mila AI4Science, Alex Hernandez-Garcia, Alexandre Duval, Alexandra Volokhova, Yoshua Bengio, Divya Sharma, Pierre Luc Carrier, Yasmine Benabed, Michał Koziarski, Victor Schmidt
Accelerating material discovery holds the potential to greatly help mitigate the climate crisis.
2 code implementations • ICLR 2021 • Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Bengio, Jian Tang
Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step.
2 code implementations • 9 Oct 2014 • Stephan Gouws, Yoshua Bengio, Greg Corrado
We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
Ranked #1 on Document Classification on Reuters En-De
Cross-Lingual Document Classification Document Classification +3
3 code implementations • ICLR 2021 • Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf
Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.
2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio
Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.
Sound Audio and Speech Processing
1 code implementation • 22 Nov 2017 • Benjamin Scellier, Yoshua Bengio
Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase.
2 code implementations • 16 Feb 2016 • Benjamin Scellier, Yoshua Bengio
Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point, or stationary distribution) towards a configuration that reduces prediction error.
3 code implementations • 14 Aug 2018 • Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio
The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists.
2 code implementations • TACL 2016 • Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio
Distributional models that learn rich semantic word representations are a success story of recent NLP research.
5 code implementations • 24 Jun 2012 • Yoshua Bengio, Aaron Courville, Pascal Vincent
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data.
1 code implementation • ICCV 2019 • Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang
Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization.
2 code implementations • 2 Oct 2019 • Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio
Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.
1 code implementation • 6 Sep 2021 • Nino Scherrer, Olexa Bilaniuk, Yashas Annadani, Anirudh Goyal, Patrick Schwab, Bernhard Schölkopf, Michael C. Mozer, Yoshua Bengio, Stefan Bauer, Nan Rosemary Ke
Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science.
2 code implementations • 11 Mar 2024 • Minsu Kim, Sanghyeok Choi, Jiwoo Son, Hyeonah Kim, Jinkyoo Park, Yoshua Bengio
This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a novel neural-guided meta-heuristic algorithm for combinatorial optimization.
2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.
1 code implementation • NeurIPS 2018 • Abel Gonzalez-Garcia, Joost Van de Weijer, Yoshua Bengio
We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.
1 code implementation • 6 Oct 2023 • Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions.
3 code implementations • ICCV 2019 • Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor
Conditional text-to-image generation is an active area of research, with many possible applications.
Ranked #2 on Text-to-Image Generation on GeNeVA (i-CLEVR)
1 code implementation • 18 Jun 2018 • Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio
We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used.
2 code implementations • 3 Feb 2022 • Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio
We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data.
1 code implementation • 28 Feb 2022 • Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data.
1 code implementation • 12 Dec 2023 • Alexandre Duval, Simon V. Mathis, Chaitanya K. Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D. Malliaros, Taco Cohen, Pietro Liò, Yoshua Bengio, Michael Bronstein
In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations.
1 code implementation • 25 Sep 2019 • Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.
Ranked #1 on Node Classification on Pubmed random partition
2 code implementations • ICLR 2022 • Victor Schmidt, Alexandra Sasha Luccioni, Mélisande Teng, Tianyu Zhang, Alexia Reynaud, Sunand Raghupathi, Gautier Cosne, Adrien Juraver, Vahe Vardanyan, Alex Hernandez-Garcia, Yoshua Bengio
Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour.
1 code implementation • 2 Mar 2022 • Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio
In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round.
1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio
We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.
Ranked #28 on Language Modelling on Penn Treebank (Word Level)
1 code implementation • 21 Dec 2013 • Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio
Catastrophic forgetting is a problem faced by many machine learning models and algorithms.
1 code implementation • 22 Dec 2014 • Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training.
1 code implementation • 26 Apr 2020 • Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Yashaswi Pathak, Hao-Ran Wei, Shengchao Liu, Karam M. J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models.
1 code implementation • ICML 2020 • Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Hao-Ran Wei, Yashaswi Pathak, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio
In this work, we propose a novel reinforcement learning (RL) setup for drug discovery that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo compound design system.
1 code implementation • NeurIPS 2023 • Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, Xiao Xiang Zhu
Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks.
2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
In this work, we propose a novel constituency parsing scheme.
1 code implementation • 12 Feb 2020 • Giulia Zarpellon, Jason Jo, Andrea Lodi, Yoshua Bengio
We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization.
2 code implementations • NeurIPS 2015 • Yann N. Dauphin, Harm de Vries, Yoshua Bengio
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks.
1 code implementation • NeurIPS 2021 • Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning.
Model-based Reinforcement Learning Out-of-Distribution Generalization +2
2 code implementations • NeurIPS 2018 • Taesup Kim, Jaesik Yoon, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem.
1 code implementation • ACL 2017 • Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau
Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.
1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
2 code implementations • 25 Mar 2017 • Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio
Our contribution is redundant counting instead of predicting a density map in order to average over errors.
1 code implementation • 1 Jun 2023 • Oussama Boussif, Ghait Boukachab, Dan Assouline, Stefano Massaroli, Tianle Yuan, Loubna Benabbou, Yoshua Bengio
Solar power harbors immense potential in mitigating climate change by substantially reducing CO$_{2}$ emissions.
1 code implementation • ICLR 2018 • Philemon Brakel, Yoshua Bengio
We propose to learn independent features with adversarial objectives which optimize such measures implicitly.
1 code implementation • 15 May 2021 • Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang
Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.
2 code implementations • 11 Jun 2014 • Jörg Bornschein, Yoshua Bengio
The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible.
1 code implementation • 28 Feb 2022 • Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy
By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.
2 code implementations • 20 Nov 2015 • Martin Arjovsky, Amar Shah, Yoshua Bengio
When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies.
Ranked #26 on Sequential Image Classification on Sequential MNIST
1 code implementation • 2 Jul 2021 • Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal
A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure.
1 code implementation • NeurIPS 2023 • Lazar Atanackovic, Alexander Tong, Bo wang, Leo J. Lee, Yoshua Bengio, Jason Hartford
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
3 code implementations • ICLR 2021 • Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang
Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
1 code implementation • IJCNLP 2019 • Xingdi Yuan, Marc-Alexandre Cote, Jie Fu, Zhouhan Lin, Christopher Pal, Yoshua Bengio, Adam Trischler
In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions.
1 code implementation • NeurIPS 2020 • Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio
First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?
1 code implementation • 26 May 2023 • Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space.
1 code implementation • ACL 2016 • Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances.
1 code implementation • 12 Jun 2015 • Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio
We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.
3 code implementations • 10 Jun 2019 • David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio
Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help.
2 code implementations • 4 Oct 2023 • Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio
We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics.
1 code implementation • 8 Jun 2017 • Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio
We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.
Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)
1 code implementation • ICLR 2018 • Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio
To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure.
Ranked #3 on Visual Question Answering (VQA) on FigureQA - test 1
1 code implementation • ACL 2019 • Chinnadhurai Sankar, Sandeep Subramanian, Christopher Pal, Sarath Chandar, Yoshua Bengio
Neural generative models have been become increasingly popular when building conversational agents.
1 code implementation • 22 Jun 2020 • Yihe Dong, Will Sawin, Yoshua Bengio
Hypergraphs provide a natural representation for many real world datasets.
5 code implementations • 28 Nov 2016 • Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
It is based on the idea that we can first learn or provide a distributed representation for each input feature (e. g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units).
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio
Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech.
2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.
1 code implementation • 23 Dec 2014 • Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio
Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment.
1 code implementation • NeurIPS 2019 • Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal
In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders.
1 code implementation • 3 Sep 2019 • Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio
We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50, 000 stable crystal unit cells that vary from containing 1 to over 100 atoms.
2 code implementations • ECCV 2020 • Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Björn Ommer, Joseph Paul Cohen
Visual Similarity plays an important role in many computer vision applications.
Ranked #13 on Metric Learning on CUB-200-2011 (using extra training data)
1 code implementation • 26 Mar 2018 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).
Ranked #6 on Speech Recognition on TIMIT
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.
1 code implementation • 21 Oct 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.
1 code implementation • 11 Oct 2022 • Oussama Boussif, Dan Assouline, Loubna Benabbou, Yoshua Bengio
The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases.
1 code implementation • 6 Jun 2022 • Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures.
2 code implementations • 15 Aug 2022 • Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng
To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks.
1 code implementation • 13 Feb 2023 • Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio
Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents.
1 code implementation • 9 Dec 2020 • Shimaa Baraka, Benjamin Akera, Bibek Aryal, Tenzing Sherpa, Finu Shresta, Anthony Ortiz, Kris Sankaran, Juan Lavista Ferres, Mir Matin, Yoshua Bengio
Glacier mapping is key to ecological monitoring in the hkh region.
1 code implementation • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020 • Md Rifat Arefin, Vincent Michalski, Pierre-Luc St-Charles, Alfredo Kalaitzis, Sookyung Kim, Samira E. Kahou, Yoshua Bengio
High-resolution satellite imagery is critical for various earth observation applications related to environment monitoring, geoscience, forecasting, and land use analysis.
1 code implementation • ICCV 2021 • Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio
To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.
1 code implementation • 23 Feb 2018 • Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, Baiyang Liu, Yoshua Bengio
Spoken language understanding system is traditionally designed as a pipeline of a number of components.
Natural Language Understanding Spoken Language Understanding
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio
The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.
4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
1 code implementation • ICML 2020 • Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio
To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.
1 code implementation • 16 Feb 2021 • Salem Lahlou, Moksh Jain, Hadi Nekoei, Victor Ion Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio
Epistemic Uncertainty is a measure of the lack of knowledge of a learner which diminishes with more evidence.
2 code implementations • NeurIPS 2021 • Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim
Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM.
1 code implementation • 27 Dec 2022 • Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.
1 code implementation • 1 Oct 2022 • Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, Yang Yuan
Conformal prediction is a distribution-free technique for establishing valid prediction intervals.
1 code implementation • 9 Feb 2024 • Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science.
1 code implementation • 5 Jun 2014 • Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio
Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.
Ranked #7 on Image Generation on Binarized MNIST
1 code implementation • NeurIPS 2014 • Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio
Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data.
Ranked #8 on Image Generation on Binarized MNIST
1 code implementation • 7 Feb 2022 • Paul Bertin, Jarrid Rector-Brooks, Deepak Sharma, Thomas Gaudelet, Andrew Anighoro, Torsten Gross, Francisco Martinez-Pena, Eileen L. Tang, Suraj M S, Cristian Regep, Jeremy Hayter, Maksym Korablyov, Nicholas Valiante, Almer van der Sloot, Mike Tyers, Charles Roberts, Michael M. Bronstein, Luke L. Lairson, Jake P. Taylor-King, Yoshua Bengio
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges.
1 code implementation • 24 Sep 2022 • Kartik Ahuja, Divyat Mahajan, Yixin Wang, Yoshua Bengio
Can interventional data facilitate causal representation learning?
1 code implementation • NeurIPS 2019 • Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie
A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary.