Search Results for author: Yoshua Bengio

Found 426 papers, 215 papers with code

Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning

1 code implementation ICML 2020 Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Hao-Ran Wei, Yashaswi Pathak, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

In this work, we propose a novel reinforcement learning (RL) setup for drug discovery that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo compound design system.

Drug Discovery

Discrete-Valued Neural Communication

no code implementations6 Jul 2021 Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

Deep learning has advanced from fully connected architectures to structured models organized into components, e. g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes.

Quantization Systematic Generalization

The Causal-Neural Connection: Expressiveness, Learnability, and Inference

no code implementations2 Jul 2021 Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim

Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM.

Causal Identification Causal Inference

Predicting Unreliable Predictions by Shattering a Neural Network

1 code implementation15 Jun 2021 Xu Ji, Razvan Pascanu, Devon Hjelm, Andrea Vedaldi, Balaji Lakshminarayanan, Yoshua Bengio

Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error.

Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

1 code implementation14 Jun 2021 Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, Stefan Bauer

However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty.

Bayesian Inference Causal Inference +2

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

1 code implementation11 Jun 2021 Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish

To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD.

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

1 code implementation8 Jun 2021 Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio

Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

2 code implementations3 Jun 2021 Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state, in order to plan and to generalize better out-of-distribution.

Model-based Reinforcement Learning

Fast and Slow Learning of Recurrent Independent Mechanisms

no code implementations18 May 2021 Kanika Madan, Nan Rosemary Ke, Anirudh Goyal, Bernhard Schölkopf, Yoshua Bengio

To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks.

Meta-Learning

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

1 code implementation15 May 2021 Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang

Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.

bilevel optimization

hBert + BiasCorp -- Fighting Racism on the Web

no code implementations6 Apr 2021 Olawale Onabola, Zhuang Ma, Yang Xie, Benjamin Akera, Abdulrahman Ibraheem, Jia Xue, Dianbo Liu, Yoshua Bengio

In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.

Neural Production Systems

no code implementations2 Mar 2021 Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be.

Coordination Among Neural Modules Through a Shared Global Workspace

no code implementations1 Mar 2021 Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio

We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments.

Transformers with Competitive Ensembles of Independent Mechanisms

no code implementations27 Feb 2021 Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.

Speech Enhancement

Learning Neural Generative Dynamics for Molecular Conformation Generation

3 code implementations ICLR 2021 Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang

Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.

DEUP: Direct Epistemic Uncertainty Prediction

2 code implementations16 Feb 2021 Moksh Jain, Salem Lahlou, Hadi Nekoei, Victor Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio

Epistemic uncertainty is the part of out-of-sample prediction error due to the lack of knowledge of the learner.

Active Learning

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

1 code implementation7 Feb 2021 Tristan Deleu, Yoshua Bengio

The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance.

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

no code implementations14 Jan 2021 Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, Damien Querlioz

Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning.

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations1 Jan 2021 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Unsupervised Representation Learning

Spatially Structured Recurrent Modules

no code implementations ICLR 2021 Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise well and are robust to changes in the input distribution.

Video Prediction

Dependency Structure Discovery from Interventions

no code implementations1 Jan 2021 Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Bernhard Schölkopf, Michael Curtis Mozer, Hugo Larochelle, Christopher Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.

Conditional Networks

no code implementations1 Jan 2021 Anthony Ortiz, Kris Sankaran, Olac Fuentes, Christopher Kiekintveld, Pascal Vincent, Yoshua Bengio, Doina Precup

In this work we tackle the problem of out-of-distribution generalization through conditional computation.

Image Classification Semantic Segmentation

Systematic generalisation with group invariant predictions

no code implementations ICLR 2021 Faruk Ahmed, Yoshua Bengio, Harm van Seijen, Aaron Courville

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently-correlating complex features.

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments

no code implementations ICLR 2021 Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e. g., health, position).

Inductive Biases for Deep Learning of Higher-Level Cognition

no code implementations30 Nov 2020 Anirudh Goyal, Yoshua Bengio

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics).

Systematic Generalization

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

no code implementations25 Nov 2020 Cheng-Hao Liu, Maksym Korablyov, Stanisław Jastrzębski, Paweł Włodarczyk-Pruszyński, Yoshua Bengio, Marwin H. S. Segler

A natural idea to mitigate this problem is to bias the search process towards more easily synthesizable molecules using a proxy for synthetic accessibility.

Gradient Starvation: A Learning Proclivity in Neural Networks

2 code implementations18 Nov 2020 Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks.

Predicting Infectiousness for Proactive Contact Tracing

1 code implementation ICLR 2021 Yoshua Bengio, Prateek Gupta, Tegan Maharaj, Nasim Rahaman, Martin Weiss, Tristan Deleu, Eilif Muller, Meng Qu, Victor Schmidt, Pierre-Luc St-Charles, Hannah Alsdurf, Olexa Bilanuik, David Buckeridge, Gáetan Marceau Caron, Pierre-Luc Carrier, Joumana Ghosn, Satya Ortiz-Gagne, Chris Pal, Irina Rish, Bernhard Schölkopf, Abhinav Sharma, Jian Tang, Andrew Williams

Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT).

NU-GAN: High resolution neural upsampling with GAN

no code implementations22 Oct 2020 Rithesh Kumar, Kundan Kumar, Vicki Anand, Yoshua Bengio, Aaron Courville

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling).

Audio Generation Speech Synthesis

Cross-Modal Information Maximization for Medical Imaging: CMIM

no code implementations20 Oct 2020 Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di Jorio, Margaux Luck, Devon Hjelm, Yoshua Bengio

In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as the different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.)

Image Classification

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

no code implementations15 Oct 2020 Alex Lamb, Anirudh Goyal, Agnieszka Słowik, Michael Mozer, Philippe Beaudoin, Yoshua Bengio

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer.

Domain Generalization

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

no code implementations ICLR 2021 Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, Stefan Bauer

To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment.

Transfer Learning

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

1 code implementation ICLR 2021 Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Bengio, Jian Tang

Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step.

Knowledge Graphs

Visual Concept Reasoning Networks

no code implementations26 Aug 2020 Taesup Kim, Sungwoong Kim, Yoshua Bengio

It approximates sparsely connected networks by explicitly defining multiple branches to simultaneously learn representations with different visual concepts or properties.

Action Recognition Image Classification +3

Mastering Rate based Curriculum Learning

1 code implementation14 Aug 2020 Lucas Willems, Salem Lahlou, Yoshua Bengio

Recent automatic curriculum learning algorithms, and in particular Teacher-Student algorithms, rely on the notion of learning progress, making the assumption that the good next tasks are the ones on which the learner is making the fastest progress or digress.

Curriculum Learning

Deriving Differential Target Propagation from Iterating Approximate Inverses

no code implementations29 Jul 2020 Yoshua Bengio

We show that a particular form of target propagation, i. e., relying on learned inverses of each layer, which is differential, i. e., where the target is a small perturbation of the forward propagation, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization, without requiring the manipulation or inversion of large matrices.

BabyAI 1.1

3 code implementations24 Jul 2020 David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.

Imitation Learning

S2RMs: Spatially Structured Recurrent Modules

no code implementations13 Jul 2020 Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution.

Video Prediction

Revisiting Fundamentals of Experience Replay

2 code implementations ICML 2020 William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning

Compositional Generalization by Factorizing Alignment and Translation

no code implementations ACL 2020 Jacob Russin, Jason Jo, R O{'}Reilly, all, Yoshua Bengio

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution.

Machine Translation Systematic Generalization

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

1 code implementation ICML 2020 Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.

Language Modelling Sequential Image Classification +1

Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

no code implementations29 Jun 2020 Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, Michael Mozer

To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e. g., health, position).

Hybrid Models for Learning to Branch

1 code implementation NeurIPS 2020 Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?

Image-to-image Mapping with Many Domains by Sparse Attribute Transfer

no code implementations23 Jun 2020 Matthew Amodio, Rim Assouel, Victor Schmidt, Tristan Sylvain, Smita Krishnaswamy, Yoshua Bengio

Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points.

Unsupervised Image-To-Image Translation

Rethinking Distributional Matching Based Domain Adaptation

no code implementations23 Jun 2020 Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Sicheng Zhao, Pengfei Xu, Wei Zhou, Yoshua Bengio, Kurt Keutzer

In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods.

Domain Adaptation

Untangling tradeoffs between recurrence and self-attention in neural networks

no code implementations16 Jun 2020 Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks.

Learning Causal Models Online

1 code implementation12 Jun 2020 Khurram Javed, Martha White, Yoshua Bengio

One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.

Continual Learning

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

1 code implementation6 Jun 2020 Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, Damien Querlioz

In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP.

Training End-to-End Analog Neural Networks with Equilibrium Propagation

no code implementations2 Jun 2020 Jack Kendall, Ross Pantone, Kalpana Manickavasagam, Yoshua Bengio, Benjamin Scellier

We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent.

Learning the Arrow of Time for Problems in Reinforcement Learning

no code implementations ICLR 2020 Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio

We humans have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment.

Equilibrium Propagation with Continual Weight Updates

no code implementations29 Apr 2020 Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

However, in existing implementations of EP, the learning rule is not local in time: the weight update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically.

Continual Weight Updates and Convolutional Architectures for Equilibrium Propagation

no code implementations29 Apr 2020 Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

On the other hand, the biological plausibility of EP is limited by the fact that its learning rule is not local in time: the synapse update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically.

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

1 code implementation ICLR 2020 Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent.

Variational Inference

Experience Grounds Language

2 code implementations EMNLP 2020 Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

Object-Centric Image Generation from Layouts

no code implementations16 Mar 2020 Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio, R. Devon Hjelm, Shikhar Sharma

In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well.

Layout-to-Image Generation

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

3 code implementations NeurIPS 2020 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score.

Image Generation

Benchmarking Graph Neural Networks

12 code implementations2 Mar 2020 Vijay Prakash Dwivedi, Chaitanya K. Joshi, Thomas Laurent, Yoshua Bengio, Xavier Bresson

Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.

Graph Classification Graph Regression +2

On Catastrophic Interference in Atari 2600 Games

1 code implementation28 Feb 2020 William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

Atari Games

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation20 Feb 2020 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Unsupervised Representation Learning

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

1 code implementation12 Feb 2020 Giulia Zarpellon, Jason Jo, Andrea Lodi, Yoshua Bengio

We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization.

Imitation Learning

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

no code implementations8 Feb 2020 Miloš Nikolić, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon, Andreas Moshovos

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths.

Quantization

Meta-learning framework with applications to zero-shot time-series forecasting

2 code implementations7 Feb 2020 Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets?

Meta-Learning Time Series +1

Combating False Negatives in Adversarial Imitation Learning

no code implementations2 Feb 2020 Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior.

Imitation Learning

Using Simulated Data to Generate Images of Climate Change

no code implementations26 Jan 2020 Gautier Cosne, Adrien Juraver, Mélisande Teng, Victor Schmidt, Vahe Vardanyan, Alexandra Luccioni, Yoshua Bengio

In our paper, we explore the potential of using images from a simulated 3D environment to improve a domain adaptation task carried out by the MUNIT architecture, aiming to use the resulting images to raise awareness of the potential future impacts of climate change.

Domain Adaptation

Multi-task self-supervised learning for Robust Speech Recognition

1 code implementation25 Jan 2020 Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.

Robust Speech Recognition Self-Supervised Learning

Universal Successor Features for Transfer Reinforcement Learning

no code implementations ICLR 2019 Chen Ma, Dylan R. Ashley, Junfeng Wen, Yoshua Bengio

Transfer in Reinforcement Learning (RL) refers to the idea of applying knowledge gained from previous tasks to solving related tasks.

Transfer Reinforcement Learning

Learning from Learning Machines: Optimisation, Rules, and Social Norms

no code implementations29 Dec 2019 Travis LaCroix, Yoshua Bengio

There is an analogy between machine learning systems and economic entities in that they are both adaptive, and their behaviour is specified in a more-or-less explicit way.

Decision Making

On the Morality of Artificial Intelligence

no code implementations26 Dec 2019 Alexandra Luccioni, Yoshua Bengio

Much of the existing research on the social and ethical impact of Artificial Intelligence has been focused on defining ethical principles and guidelines surrounding Machine Learning (ML) and other Artificial Intelligence (AI) algorithms [IEEE, 2017, Jobin et al., 2019].

A learning-based algorithm to quickly compute good primal solutions for Stochastic Integer Programs

1 code implementation17 Dec 2019 Yoshua Bengio, Emma Frejinger, Andrea Lodi, Rahul Patel, Sriram Sankaranarayanan

We propose a novel approach using supervised learning to obtain near-optimal primal solutions for two-stage stochastic integer programming (2SIP) problems with constraints in the first and second stages.

Joint Learning of Generative Translator and Classifier for Visually Similar Classes

no code implementations15 Dec 2019 ByungIn Yoo, Tristan Sylvain, Yoshua Bengio, Junmo Kim

In this paper, we propose a Generative Translation Classification Network (GTCN) for improving visual classification accuracy in settings where classes are visually similar and data is scarce.

Data Augmentation Domain Adaptation +1

CLOSURE: Assessing Systematic Generalization of CLEVR Models

3 code implementations12 Dec 2019 Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville

In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs.

Few-Shot Learning Systematic Generalization +1

Applying Knowledge Transfer for Water Body Segmentation in Peru

no code implementations2 Dec 2019 Jessenia Gonzalez, Debjani Bhowmick, Cesar Beltran, Kris Sankaran, Yoshua Bengio

In this work, we present the application of convolutional neural networks for segmenting water bodies in satellite images.

Transfer Learning

Automated curriculum generation for Policy Gradients from Demonstrations

1 code implementation1 Dec 2019 Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert, Yoshua Bengio

In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following.

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

no code implementations18 Nov 2019 Tong Che, Xiaofeng Liu, Site Li, Yubin Ge, Ruixiang Zhang, Caiming Xiong, Yoshua Bengio

We test the verifier network on out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation.

Anomaly Detection Autonomous Driving +2

Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks

no code implementations15 Nov 2019 Thomas Mesnard, Gaetan Vignoud, Joao Sacramento, Walter Senn, Yoshua Bengio

This reduced system combines the essential elements to have a working biologically abstracted analogue of backpropagation with a simple formulation and proofs of the associated results.

Small-GAN: Speeding Up GAN Training Using Core-sets

no code implementations ICML 2020 Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes.

Active Learning Anomaly Detection +1

Establishing an Evaluation Metric to Quantify Climate Change Image Realism

no code implementations22 Oct 2019 Sharon Zhou, Alexandra Luccioni, Gautier Cosne, Michael S. Bernstein, Yoshua Bengio

Because metrics for comparing the realism of different modes in a conditional generative model do not exist, we propose several automated and human-based methods for evaluation.

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

1 code implementation21 Oct 2019 Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.

Unsupervised Representation Learning

Predicting ice flow using machine learning

no code implementations20 Oct 2019 Yimeng Min, S. Karthik Mukkavilli, Yoshua Bengio

Though machine learning has achieved notable success in modeling sequential and spatial data for speech recognition and in computer vision, applications to remote sensing and climate science problems are seldom considered.

Optical Flow Estimation Speech Recognition

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

15 code implementations NeurIPS 2019 Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville

In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.

Speech Synthesis

Variational Temporal Abstraction

1 code implementation NeurIPS 2019 Taesup Kim, Sungjin Ahn, Yoshua Bengio

We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data.

Hierarchical structure

Learning Neural Causal Models from Unknown Interventions

2 code implementations2 Oct 2019 Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.

Meta-Learning

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

no code implementations ICLR 2021 Joseph D. Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen

In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction.

General Classification Object Classification

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

1 code implementation25 Sep 2019 Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.

Generalization Bounds Graph Attention +2

Recurrent Independent Mechanisms

4 code implementations ICLR 2021 Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.

Avoidance Learning Using Observational Reinforcement Learning

1 code implementation24 Sep 2019 David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup

We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator.

Imitation Learning

Torchmeta: A Meta-Learning library for PyTorch

2 code implementations14 Sep 2019 Tristan Deleu, Tobias Würfl, Mandana Samiei, Joseph Paul Cohen, Yoshua Bengio

The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research.

Meta-Learning

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

1 code implementation3 Sep 2019 Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio

We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50, 000 stable crystal unit cells that vary from containing 1 to over 100 atoms.

Drug Discovery Text Generation

Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization

1 code implementation ICCV 2019 Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang

Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization.

Image-to-Image Translation

Weakly-supervised Knowledge Graph Alignment with Adversarial Learning

no code implementations ICLR 2019 Meng Qu, Jian Tang, Yoshua Bengio

Therefore, in this paper we propose to study aligning knowledge graphs in fully-unsupervised or weakly-supervised fashion, i. e., without or with only a few aligned triplets.

Knowledge Graphs

Learning the Arrow of Time

no code implementations2 Jul 2019 Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio

We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment.

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

1 code implementation ICLR 2020 Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio

Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior.

Hierarchical Reinforcement Learning

Perceptual Generative Autoencoders

2 code implementations ICML 2020 Zijun Zhang, Ruixiang Zhang, Zongpeng Li, Yoshua Bengio, Liam Paull

We therefore propose to map both the generated and target distributions to a latent space using the encoder of a standard autoencoder, and train the generator (or decoder) to match the target distribution in the latent space.

Unsupervised State Representation Learning in Atari

3 code implementations NeurIPS 2019 Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R. Devon Hjelm

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks.

Atari Games Representation Learning

On the interplay between noise and curvature and its effect on optimization and generalization

no code implementations18 Jun 2019 Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Mangazol, Yoshua Bengio, Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients.

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy

2 code implementations16 Jun 2019 Alex Lamb, Vikas Verma, Kenji Kawaguchi, Juho Kannala, Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in the theory and the practice.

Conditional Computation for Continual Learning

no code implementations16 Jun 2019 Min Lin, Jie Fu, Yoshua Bengio

In this study, we analyze parameter sharing under the conditional computation framework where the parameters of a neural network are conditioned on each input example.

Continual Learning

Learning Powerful Policies by Using Consistent Dynamics Model

1 code implementation11 Jun 2019 Shagun Sodhani, Anirudh Goyal, Tristan Deleu, Yoshua Bengio, Sergey Levine, Jian Tang

There is enough evidence that humans build a model of the environment, not only by observing the environment but also by interacting with the environment.

Atari Games Model-based Reinforcement Learning

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

2 code implementations NeurIPS 2019 Devansh Arpit, Victor Campos, Yoshua Bengio

Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input

2 code implementations NeurIPS 2019 Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i. e. RNNs that are fed by a static input x and settle to a steady state.

Attention Based Pruning for Shift Networks

1 code implementation29 May 2019 Ghouthi Boukli Hacene, Carlos Lassance, Vincent Gripon, Matthieu Courbariaux, Yoshua Bengio

In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods.

Object Recognition

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

1 code implementation NeurIPS 2019 Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary.

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

no code implementations22 May 2019 Jonathan Binas, Sherjil Ozair, Yoshua Bengio

Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments.

Representation Learning

GMNN: Graph Markov Neural Networks

1 code implementation15 May 2019 Meng Qu, Yoshua Bengio, Jian Tang

Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training.

Classification General Classification +3

Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

no code implementations2 May 2019 Victor Schmidt, Alexandra Luccioni, S. Karthik Mukkavilli, Narmada Balasooriya, Kris Sankaran, Jennifer Chayes, Yoshua Bengio

We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consistent Adversarial Networks (CycleGANs).

A Walk with SGD: How SGD Explores Regions of Deep Network Loss?

no code implementations ICLR 2019 Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

The non-convex nature of the loss landscape of deep neural networks (DNN) lends them the intuition that over the course of training, stochastic optimization algorithms explore different regions of the loss surface by entering and escaping many local minima due to the noise induced by mini-batches.

Stochastic Optimization

Probabilistic Planning with Sequential Monte Carlo methods

no code implementations ICLR 2019 Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal

In this work, we propose a novel formulation of planning which views it as a probabilistic inference problem over future optimal trajectories.

Continuous Control

Reinforced Imitation Learning from Observations

no code implementations ICLR 2019 Konrad Zolna, Negar Rostamzadeh, Yoshua Bengio, Sungjin Ahn, Pedro O. Pinheiro

Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse.

Imitation Learning

Transfer and Exploration via the Information Bottleneck

1 code implementation ICLR 2019 Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine

In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Manifold Mixup: Learning Better Representations by Interpolating Hidden States

1 code implementation ICLR 2019 Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Aaron Courville, Ioannis Mitliagkas, Yoshua Bengio

Because the hidden states are learned, this has an important effect of encouraging the hidden states for a class to be concentrated in such a way so that interpolations within the same class or between two different classes do not intersect with the real data points from other classes.

EnGAN: Latent Space MCMC and Maximum Entropy Generators for Energy-based Models

no code implementations ICLR 2019 Rithesh Kumar, Anirudh Goyal, Aaron Courville, Yoshua Bengio

Unsupervised learning is about capturing dependencies between variables and is driven by the contrast between the probable vs improbable configurations of these variables, often either via a generative model which only samples probable ones or with an energy function (unnormalized log-density) which is low for probable ones and high for improbable ones.

Anomaly Detection

Unsupervised one-to-many image translation

no code implementations ICLR 2019 Samuel Lavoie-Marchildon, Sebastien Lachapelle, Mikołaj Bińkowski, Aaron Courville, Yoshua Bengio, R. Devon Hjelm

We perform completely unsupervised one-sided image to image translation between a source domain $X$ and a target domain $Y$ such that we preserve relevant underlying shared semantics (e. g., class, size, shape, etc).

Unsupervised Image-To-Image Translation

Compositional generalization in a deep seq2seq model by separating syntax and semantics

1 code implementation22 Apr 2019 Jake Russin, Jason Jo, Randall C. O'Reilly, Yoshua Bengio

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution.

Machine Translation Systematic Generalization

GradMask: Reduce Overfitting by Regularizing Saliency

no code implementations16 Apr 2019 Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen

With too few samples or too many model parameters, overfitting can inhibit the ability to generalise predictions to new data.

Lesion Segmentation

Speech Model Pre-training for End-to-End Spoken Language Understanding

2 code implementations7 Apr 2019 Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.

Ranked #2 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation6 Apr 2019 Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Distant Speech Recognition Hierarchical structure

Reinforced Imitation in Heterogeneous Action Space

no code implementations6 Apr 2019 Konrad Zolna, Negar Rostamzadeh, Yoshua Bengio, Sungjin Ahn, Pedro O. Pinheiro

Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse.

Imitation Learning

InfoMask: Masked Variational Latent Representation to Localize Chest Disease

no code implementations28 Mar 2019 Saeid Asgari Taghanaki, Mohammad Havaei, Tess Berthier, Francis Dutil, Lisa Di Jorio, Ghassan Hamarneh, Yoshua Bengio

The scarcity of richly annotated medical images is limiting supervised deep learning based solutions to medical image analysis tasks, such as localizing discriminatory radiomic disease signatures.

Multiple Instance Learning

Wasserstein Dependency Measure for Representation Learning

no code implementations NeurIPS 2019 Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet

Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning.

Object Recognition Speech Recognition +2

Towards Standardization of Data Licenses: The Montreal Data License

no code implementations21 Mar 2019 Misha Benjamin, Paul Gagnon, Negar Rostamzadeh, Chris Pal, Yoshua Bengio, Alex Shee

This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning.

Gradient based sample selection for online continual learning

2 code implementations NeurIPS 2019 Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio

To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal.

Continual Learning

Interpolation Consistency Training for Semi-Supervised Learning

4 code implementations9 Mar 2019 Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Bengio, David Lopez-Paz

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.

General Classification Semi-Supervised Image Classification

On Adversarial Mixup Resynthesis

1 code implementation NeurIPS 2019 Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal

In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders.

Resynthesis

Hyperbolic Discounting and Learning over Multiple Horizons

1 code implementation ICLR 2020 William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process.

InfoBot: Transfer and Exploration via the Information Bottleneck

no code implementations30 Jan 2019 Anirudh Goyal, Riashat Islam, Daniel Strouse, Zafarali Ahmed, Matthew Botvinick, Hugo Larochelle, Yoshua Bengio, Sergey Levine

In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Maximum Entropy Generators for Energy-Based Models

2 code implementations24 Jan 2019 Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio

Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.

Anomaly Detection

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

no code implementations22 Jan 2019 Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Frejinger, Simon Lacoste-Julien, Andrea Lodi

We formulate the problem as a two-stage optimal prediction stochastic program whose solution we predict with a supervised machine learning algorithm.

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

1 code implementation11 Jan 2019 Devansh Arpit, Yoshua Bengio

These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound.

Speech and Speaker Recognition from Raw Waveform with SincNet

2 code implementations13 Dec 2018 Mirco Ravanelli, Yoshua Bengio

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.

Speaker Recognition Speech Recognition

An Empirical Study of Example Forgetting during Deep Neural Network Learning

1 code implementation ICLR 2019 Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.

General Classification