Search Results for author: Yoshua Bengio

Found 514 papers, 247 papers with code

hBERT + BiasCorp - Fighting Racism on the Web

no code implementations EACL (LTEDI) 2021 Olawale Onabola, Zhuang Ma, Xie Yang, Benjamin Akera, Ibraheem Abdulrahman, Jia Xue, Dianbo Liu, Yoshua Bengio

In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.

Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning

1 code implementation ICML 2020 Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Hao-Ran Wei, Yashaswi Pathak, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

In this work, we propose a novel reinforcement learning (RL) setup for drug discovery that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo compound design system.

Drug Discovery Navigate +2

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

no code implementations3 Feb 2023 Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood).

GFlowNets for AI-Driven Scientific Discovery

no code implementations1 Feb 2023 Moksh Jain, Tristan Deleu, Jason Hartford, Cheng-Hao Liu, Alex Hernandez-Garcia, Yoshua Bengio

However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline.

Efficient Exploration Experimental Design

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

1 code implementation1 Feb 2023 Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian Fatras, Guy Wolf, Yoshua Bengio

CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models.

A theory of continuous generative flow networks

1 code implementation30 Jan 2023 Salem Lahlou, Tristan Deleu, Pablo Lemos, Dinghuai Zhang, Alexandra Volokhova, Alex Hernández-García, Léna Néhale Ezzine, Yoshua Bengio, Nikolay Malkin

Generative flow networks (GFlowNets) are amortized variational inference algorithms that are trained to sample from unnormalized target distributions over compositional objects.

Variational Inference

Leveraging the Third Dimension in Contrastive Learning

no code implementations27 Jan 2023 Sumukh Aithal, Anirudh Goyal, Alex Lamb, Yoshua Bengio, Michael Mozer

We evaluate these two approaches on three different SSL methods -- BYOL, SimSiam, and SwAV -- using ImageNette (10 class subset of ImageNet), ImageNet-100 and ImageNet-1k datasets.

Contrastive Learning Depth Estimation +2

Regeneration Learning: A Learning Paradigm for Data Generation

no code implementations21 Jan 2023 Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio

Regeneration learning extends the concept of representation learning to data generation tasks, and can be regarded as a counterpart of traditional representation learning, since 1) regeneration learning handles the abstraction (Y') of the target data Y for data generation while traditional representation learning handles the abstraction (X') of source data X for data understanding; 2) both the processes of Y'-->Y in regeneration learning and X-->X' in representation learning can be learned in a self-supervised way (e. g., pre-training); 3) both the mappings from X to Y' in regeneration learning and from X' to Y in representation learning are simpler than the direct mapping from X to Y.

Image Generation Representation Learning +6

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

no code implementations27 Dec 2022 Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.

Data Augmentation

Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective

no code implementations26 Nov 2022 Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited.

Disentanglement Meta-Learning +1

PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design

no code implementations22 Nov 2022 Alexandre Duval, Victor Schmidt, Santiago Miret, Yoshua Bengio, Alex Hernández-García, David Rolnick

Catalyst materials play a crucial role in the electrochemical reactions involved in a great number of industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis.

Latent Bottlenecked Attentive Neural Processes

no code implementations15 Nov 2022 Leo Feng, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed

We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors.

Meta-Learning Multi-Armed Bandits

Equivariance with Learned Canonicalization Functions

no code implementations11 Nov 2022 Sékou-Oumar Kaba, Arnab Kumar Mondal, Yan Zhang, Yoshua Bengio, Siamak Ravanbakhsh

Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations.

Posterior samples of source galaxies in strong gravitational lenses with score-based priors

no code implementations7 Nov 2022 Alexandre Adam, Adam Coogan, Nikolay Malkin, Ronan Legin, Laurence Perreault-Levasseur, Yashar Hezaveh, Yoshua Bengio

Inferring accurate posteriors for high-dimensional representations of the brightness of gravitationally-lensed sources is a major challenge, in part due to the difficulties of accurately quantifying the priors.

A General Purpose Neural Architecture for Geospatial Systems

no code implementations4 Nov 2022 Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf

Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications.

Disaster Response Humanitarian +1

Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes

no code implementations4 Nov 2022 Mizu Nishikawa-Toomey, Tristan Deleu, Jithendaraa Subramanian, Yoshua Bengio, Laurent Charlin

We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model.

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations1 Nov 2022 Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning reinforcement Learning

Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions

no code implementations1 Nov 2022 Chanakya Ekbote, Moksh Jain, Payel Das, Yoshua Bengio

We hypothesize that this can lead to incompatibility between the inductive optimization biases in training $R$ and in training the GFlowNet, potentially leading to worse samples and slow adaptation to changes in the distribution.

Active Learning

FL Games: A Federated Learning Framework for Distribution Shifts

no code implementations31 Oct 2022 Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server.

Federated Learning

GFlowOut: Dropout with Generative Flow Networks

no code implementations24 Oct 2022 Dianbo Liu, Moksh Jain, Bonaventure Dossou, Qianli Shen, Salem Lahlou, Anirudh Goyal, Nikolay Malkin, Chris Emezue, Dinghuai Zhang, Nadhir Hassen, Xu Ji, Kenji Kawaguchi, Yoshua Bengio

These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation.

Bayesian Inference Variational Inference

Multi-Objective GFlowNets

no code implementations23 Oct 2022 Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio

Through a series of experiments on synthetic and benchmark tasks, we empirically demonstrate that MOGFNs outperform existing methods in terms of Hypervolume, R2-distance and candidate diversity.

Active Learning Drug Discovery

Neural Attentive Circuits

no code implementations14 Oct 2022 Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities.

Point Cloud Classification text-classification +1

Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL

1 code implementation12 Oct 2022 Chen Sun, Wannan Yang, Benjamin Alsbury-Nealy, Thomas Jiralerspong, Yoshua Bengio, Blake Richards

This takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment.

Contrastive Learning Out-of-Distribution Generalization

MAgNet: Mesh Agnostic Neural PDE Solver

1 code implementation11 Oct 2022 Oussama Boussif, Dan Assouline, Loubna Benabbou, Yoshua Bengio

The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases.

Generative Augmented Flow Networks

no code implementations7 Oct 2022 Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio

We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments.

Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

1 code implementation4 Oct 2022 Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

We formalize the notions of coordination level and heterogeneity level of an environment and present HECOGrid, a suite of multi-agent RL environments that facilitates empirical evaluation of different MARL approaches across different levels of coordination and environmental heterogeneity by providing a quantitative control over coordination and heterogeneity levels of the environment.

Multi-agent Reinforcement Learning reinforcement-learning +1

Latent State Marginalization as a Low-cost Approach for Improving Exploration

no code implementations3 Oct 2022 Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity.

Continuous Control SMAC+

GFlowNets and variational inference

no code implementations2 Oct 2022 Nikolay Malkin, Salem Lahlou, Tristan Deleu, Xu Ji, Edward Hu, Katie Everett, Dinghuai Zhang, Yoshua Bengio

This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically used to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs.

reinforcement Learning Variational Inference

Learning GFlowNets from partial episodes for improved convergence and stability

1 code implementation26 Sep 2022 Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, Emmanuel Bengio, Moksh Jain, Andrei Nica, Tom Bosc, Yoshua Bengio, Nikolay Malkin

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks.

Interventional Causal Representation Learning

no code implementations24 Sep 2022 Kartik Ahuja, Yixin Wang, Divyat Mahajan, Yoshua Bengio

Most existing works focus on identifiable representation learning with observational data, relying on distributional assumptions on latent (causal) factors.

Representation Learning

Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection

no code implementations18 Sep 2022 Bonaventure F. P. Dossou, Dianbo Liu, Xu Ji, Moksh Jain, Almer M. van der Sloot, Roger Palou, Michael Tyers, Yoshua Bengio

As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year.

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

no code implementations13 Sep 2022 Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds.

Meta-Learning Meta Reinforcement Learning +2

Unifying Generative Models with GFlowNets and Beyond

no code implementations6 Sep 2022 Dinghuai Zhang, Ricky T. Q. Chen, Nikolay Malkin, Yoshua Bengio

Our framework provides a means for unifying training and inference algorithms, and provides a route to shine a unifying light over many generative models.

Decision Making

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

1 code implementation15 Aug 2022 Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng

To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks.

Ethics Multi-agent Reinforcement Learning

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning

no code implementations10 Aug 2022 Siba Moussa, Michael Kilgour, Clara Jans, Alex Hernandez-Garcia, Miroslava Cuperlovic-Culf, Yoshua Bengio, Lena Simine

Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria.

Discrete Key-Value Bottleneck

no code implementations22 Jul 2022 Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf

We theoretically investigate the ability of the proposed model to minimize the effect of the distribution shifts and show that such a discrete bottleneck with (key, value) pairs reduces the complexity of the hypothesis class.

Lookback for Learning to Branch

no code implementations30 Jun 2022 Prateek Gupta, Elias B. Khalil, Didier Chetélat, Maxime Gasse, Yoshua Bengio, Andrea Lodi, M. Pawan Kumar

Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure.

Model Selection Variable Selection

On Neural Architecture Inductive Biases for Relational Tasks

1 code implementation9 Jun 2022 Giancarlo Kerg, Sarthak Mittal, David Rolnick, Yoshua Bengio, Blake Richards, Guillaume Lajoie

Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems.

Inductive Bias Out-of-Distribution Generalization

On the Generalization and Adaption Performance of Causal Models

no code implementations9 Jun 2022 Nino Scherrer, Anirudh Goyal, Stefan Bauer, Yoshua Bengio, Nan Rosemary Ke

Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes and offer robust generalization.

Causal Discovery Out-of-Distribution Generalization

Building Robust Ensembles via Margin Boosting

1 code implementation7 Jun 2022 Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks.

Adversarial Robustness

Is a Modular Architecture Enough?

1 code implementation6 Jun 2022 Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures.

Out-of-Distribution Generalization

Weakly Supervised Representation Learning with Sparse Perturbations

1 code implementation2 Jun 2022 Kartik Ahuja, Jason Hartford, Yoshua Bengio

We show that if the perturbations are applied only on mutually exclusive blocks of latents, we identify the latents up to those blocks.

Representation Learning

Agnostic Physics-Driven Deep Learning

no code implementations30 May 2022 Benjamin Scellier, Siddhartha Mishra, Yoshua Bengio, Yann Ollivier

This work establishes that a physical system can perform statistical learning without gradient computations, via an Agnostic Equilibrium Propagation (Aeqprop) procedure that combines energy minimization, homeostatic control, and nudging towards the correct response.

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

no code implementations30 May 2022 Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors.

Decision Making Inductive Bias

FL Games: A federated learning framework for distribution shifts

no code implementations23 May 2022 Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server.

Federated Learning

FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data

1 code implementation19 May 2022 Mike He Zhu, Léna Néhale Ezzine, Dianbo Liu, Yoshua Bengio

Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos.

Federated Learning

A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition

no code implementations6 May 2022 Sanghyun Yoo, Inchul Song, Yoshua Bengio

In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM.

speech-recognition Speech Recognition

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations21 Mar 2022 Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Continuous-Time Meta-Learning with Forward Mode Differentiation

no code implementations ICLR 2022 Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

Few-Shot Image Classification Meta-Learning

Biological Sequence Design with GFlowNets

1 code implementation2 Mar 2022 Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio

In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round.

Active Learning

Combining Modular Skills in Multitask Learning

1 code implementation28 Feb 2022 Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy

By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.

Instruction Following reinforcement-learning +1

Bayesian Structure Learning with Generative Flow Networks

1 code implementation28 Feb 2022 Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data.

Variational Inference

Generative Flow Networks for Discrete Probabilistic Modeling

1 code implementation3 Feb 2022 Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data.

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

no code implementations2 Feb 2022 Dianbo Liu, Alex Lamb, Xu Ji, Pascal Notsawo, Mike Mozer, Yoshua Bengio, Kenji Kawaguchi

Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit.

Quantization reinforcement-learning +2

Trajectory balance: Improved credit assignment in GFlowNets

1 code implementation31 Jan 2022 Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object.

Towards Scaling Difference Target Propagation by Learning Backprop Targets

1 code implementation31 Jan 2022 Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks.

The Effect of Diversity in Meta-Learning

1 code implementation27 Jan 2022 Ramnath Kumar, Tristan Deleu, Yoshua Bengio

Recent studies show that task distribution plays a vital role in the meta-learner's performance.

Few-Shot Learning

Rethinking Learning Dynamics in RL using Adversarial Networks

1 code implementation27 Jan 2022 Ramnath Kumar, Tristan Deleu, Yoshua Bengio

We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.

reinforcement-learning reinforcement Learning

Multi-scale Feature Learning Dynamics: Insights for Double Descent

1 code implementation6 Dec 2021 Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters.

GFlowNet Foundations

no code implementations17 Nov 2021 Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.

Active Learning

Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning

no code implementations ICLR 2022 Kartik Ahuja, Jason Hartford, Yoshua Bengio

These results suggest that by exploiting inductive biases on mechanisms, it is possible to design a range of new identifiable representation learning approaches.

Representation Learning

Chunked Autoregressive GAN for Conditional Waveform Synthesis

1 code implementation ICLR 2022 Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.

Inductive Bias

Compositional Attention: Disentangling Search and Retrieval

3 code implementations ICLR 2022 Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

Unifying Likelihood-free Inference with Black-box Optimization and Beyond

no code implementations ICLR 2022 Dinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville

Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on the pharmaceutical industry.

Drug Discovery

Divide and Explore: Multi-Agent Separate Exploration with Shared Intrinsic Motivations

no code implementations29 Sep 2021 Xiao Jing, Zhenwei Zhu, Hongliang Li, Xin Pei, Yoshua Bengio, Tong Che, Hongyong Song

One of the greatest challenges of reinforcement learning is efficient exploration, especially when training signals are sparse or deceptive.

Distributed Computing Efficient Exploration

Discrete-Valued Neural Communication

no code implementations NeurIPS 2021 Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

Deep learning has advanced from fully connected architectures to structured models organized into components, e. g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes.

Quantization Systematic Generalization

The Causal-Neural Connection: Expressiveness, Learnability, and Inference

2 code implementations NeurIPS 2021 Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim

Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM.

Causal Identification Causal Inference +1

Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

1 code implementation14 Jun 2021 Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, Stefan Bauer

However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty.

Bayesian Inference Causal Inference +2

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

2 code implementations NeurIPS 2021 Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio

Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.

Fast and Slow Learning of Recurrent Independent Mechanisms

no code implementations18 May 2021 Kanika Madan, Nan Rosemary Ke, Anirudh Goyal, Bernhard Schölkopf, Yoshua Bengio

To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks.

Meta-Learning

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

1 code implementation15 May 2021 Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang

Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.

Bilevel Optimization

HBert + BiasCorp -- Fighting Racism on the Web

no code implementations6 Apr 2021 Olawale Onabola, Zhuang Ma, Yang Xie, Benjamin Akera, Abdulrahman Ibraheem, Jia Xue, Dianbo Liu, Yoshua Bengio

In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.

Transformers with Competitive Ensembles of Independent Mechanisms

no code implementations27 Feb 2021 Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.

Speech Enhancement

Learning Neural Generative Dynamics for Molecular Conformation Generation

3 code implementations ICLR 2021 Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang

Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

1 code implementation7 Feb 2021 Tristan Deleu, Yoshua Bengio

The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance.

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

no code implementations14 Jan 2021 Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, Damien Querlioz

Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning.

Spatially Structured Recurrent Modules

no code implementations ICLR 2021 Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise well and are robust to changes in the input distribution.

Starcraft II Video Prediction

Dependency Structure Discovery from Interventions

no code implementations1 Jan 2021 Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Bernhard Schölkopf, Michael Curtis Mozer, Hugo Larochelle, Christopher Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.

Systematic generalisation with group invariant predictions

no code implementations ICLR 2021 Faruk Ahmed, Yoshua Bengio, Harm van Seijen, Aaron Courville

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently-correlating complex features.

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments

no code implementations ICLR 2021 Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e. g., health, position).

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations1 Jan 2021 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

1 code implementation ICCV 2021 Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio

To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.

object-detection Robust Object Detection

Inductive Biases for Deep Learning of Higher-Level Cognition

no code implementations30 Nov 2020 Anirudh Goyal, Yoshua Bengio

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics).

Systematic Generalization

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

no code implementations25 Nov 2020 Cheng-Hao Liu, Maksym Korablyov, Stanisław Jastrzębski, Paweł Włodarczyk-Pruszyński, Yoshua Bengio, Marwin H. S. Segler

A natural idea to mitigate this problem is to bias the search process towards more easily synthesizable molecules using a proxy for synthetic accessibility.

Predicting Infectiousness for Proactive Contact Tracing

1 code implementation ICLR 2021 Yoshua Bengio, Prateek Gupta, Tegan Maharaj, Nasim Rahaman, Martin Weiss, Tristan Deleu, Eilif Muller, Meng Qu, Victor Schmidt, Pierre-Luc St-Charles, Hannah Alsdurf, Olexa Bilanuik, David Buckeridge, Gáetan Marceau Caron, Pierre-Luc Carrier, Joumana Ghosn, Satya Ortiz-Gagne, Chris Pal, Irina Rish, Bernhard Schölkopf, Abhinav Sharma, Jian Tang, Andrew Williams

Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT).

NU-GAN: High resolution neural upsampling with GAN

no code implementations22 Oct 2020 Rithesh Kumar, Kundan Kumar, Vicki Anand, Yoshua Bengio, Aaron Courville

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling).

Audio Generation Speech Synthesis

Cross-Modal Information Maximization for Medical Imaging: CMIM

no code implementations20 Oct 2020 Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di Jorio, Margaux Luck, Devon Hjelm, Yoshua Bengio

In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as the different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.)

Image Classification Medical Image Classification

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

no code implementations15 Oct 2020 Alex Lamb, Anirudh Goyal, Agnieszka Słowik, Michael Mozer, Philippe Beaudoin, Yoshua Bengio

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer.

Domain Generalization

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

no code implementations ICLR 2021 Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, Stefan Bauer

To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment.

Transfer Learning

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

2 code implementations ICLR 2021 Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Bengio, Jian Tang

Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step.

Knowledge Graphs

Visual Concept Reasoning Networks

no code implementations26 Aug 2020 Taesup Kim, Sungwoong Kim, Yoshua Bengio

It approximates sparsely connected networks by explicitly defining multiple branches to simultaneously learn representations with different visual concepts or properties.

Action Recognition Image Classification +4

Mastering Rate based Curriculum Learning

1 code implementation14 Aug 2020 Lucas Willems, Salem Lahlou, Yoshua Bengio

Recent automatic curriculum learning algorithms, and in particular Teacher-Student algorithms, rely on the notion of learning progress, making the assumption that the good next tasks are the ones on which the learner is making the fastest progress or digress.

Deriving Differential Target Propagation from Iterating Approximate Inverses

no code implementations29 Jul 2020 Yoshua Bengio

We show that a particular form of target propagation, i. e., relying on learned inverses of each layer, which is differential, i. e., where the target is a small perturbation of the forward propagation, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization, without requiring the manipulation or inversion of large matrices.

BabyAI 1.1

3 code implementations24 Jul 2020 David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.

Imitation Learning reinforcement-learning +1

S2RMs: Spatially Structured Recurrent Modules

no code implementations13 Jul 2020 Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution.

Starcraft II Video Prediction

Revisiting Fundamentals of Experience Replay

2 code implementations ICML 2020 William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning

Compositional Generalization by Factorizing Alignment and Translation

no code implementations ACL 2020 Jacob Russin, Jason Jo, R O{'}Reilly, all, Yoshua Bengio

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution.

Machine Translation Systematic Generalization +1

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

1 code implementation ICML 2020 Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.

Language Modelling Sequential Image Classification +1

Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

no code implementations29 Jun 2020 Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, Michael Mozer

To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e. g., health, position).

Hybrid Models for Learning to Branch

1 code implementation NeurIPS 2020 Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?

Image-to-image Mapping with Many Domains by Sparse Attribute Transfer

no code implementations23 Jun 2020 Matthew Amodio, Rim Assouel, Victor Schmidt, Tristan Sylvain, Smita Krishnaswamy, Yoshua Bengio

Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points.

Translation Unsupervised Image-To-Image Translation

Rethinking Distributional Matching Based Domain Adaptation

no code implementations23 Jun 2020 Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Sicheng Zhao, Pengfei Xu, Wei Zhou, Yoshua Bengio, Kurt Keutzer

In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods.

Domain Adaptation

Learning Causal Models Online

1 code implementation12 Jun 2020 Khurram Javed, Martha White, Yoshua Bengio

One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.

Continual Learning

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

1 code implementation6 Jun 2020 Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, Damien Querlioz

In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP.

Training End-to-End Analog Neural Networks with Equilibrium Propagation

no code implementations2 Jun 2020 Jack Kendall, Ross Pantone, Kalpana Manickavasagam, Yoshua Bengio, Benjamin Scellier

We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent.

Learning the Arrow of Time for Problems in Reinforcement Learning

no code implementations ICLR 2020 Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio

We humans have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment.

reinforcement-learning reinforcement Learning

Equilibrium Propagation with Continual Weight Updates

no code implementations29 Apr 2020 Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

However, in existing implementations of EP, the learning rule is not local in time: the weight update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically.

Continual Weight Updates and Convolutional Architectures for Equilibrium Propagation

no code implementations29 Apr 2020 Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

On the other hand, the biological plausibility of EP is limited by the fact that its learning rule is not local in time: the synapse update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically.

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

1 code implementation ICLR 2020 Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent.

reinforcement-learning reinforcement Learning +1

Experience Grounds Language

2 code implementations EMNLP 2020 Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

Object-Centric Image Generation from Layouts

no code implementations16 Mar 2020 Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio, R. Devon Hjelm, Shikhar Sharma

In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well.

Layout-to-Image Generation

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

3 code implementations NeurIPS 2020 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score.

Image Generation

Benchmarking Graph Neural Networks

14 code implementations2 Mar 2020 Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson

In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.

Graph Classification Graph Regression +2

On Catastrophic Interference in Atari 2600 Games

1 code implementation28 Feb 2020 William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

Atari Games reinforcement-learning +1

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation20 Feb 2020 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning