Search Results for author: Doina Precup

Found 139 papers, 44 papers with code

Attention Option-Critic

no code implementations ICML Workshop LifelongML 2020 Raviteja Chunduru, Doina Precup

Temporal abstraction in reinforcement learning is the ability of an agent to learn and use high-level behaviors, called options.

Atari Games Transfer Learning

Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning

no code implementations31 Dec 2021 Samin Yeasar Arnob, Riashat Islam, Doina Precup

We hypothesize that empirically studying the sample complexity of offline reinforcement learning (RL) is crucial for the practical applications of RL in the real world.

Offline RL

Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates

no code implementations30 Dec 2021 Safa Alver, Doina Precup

We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks with no or very little new data.

Proving Theorems using Incremental Learning and Hindsight Experience Replay

no code implementations20 Dec 2021 Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains.

Automated Theorem Proving Incremental Learning

Flexible Option Learning

1 code implementation NeurIPS 2021 Martin Klissarov, Doina Precup

Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time.

Hierarchical Reinforcement Learning Transfer Learning

On the Expressivity of Markov Reward

no code implementations NeurIPS 2021 David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

Temporal Abstraction in Reinforcement Learning with the Successor Representation

no code implementations12 Oct 2021 Marlos C. Machado, Andre Barreto, Doina Precup

In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.

Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

no code implementations8 Sep 2021 Maziar Gomrokchi, Susan Amin, Hossein Aboutalebi, Alexander Wong, Doina Precup

While significant research advances have been made in the field of deep reinforcement learning, a major challenge to widespread industrial adoption of deep reinforcement learning that has recently surfaced but little explored is the potential vulnerability to privacy breaches.

Adversarial Attack OpenAI Gym

A Survey of Exploration Methods in Reinforcement Learning

no code implementations1 Sep 2021 Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.

Temporally Abstract Partial Models

1 code implementation NeurIPS 2021 Khimya Khetarpal, Zafarali Ahmed, Gheorghe Comanici, Doina Precup

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales.

Policy Gradients Incorporating the Future

no code implementations4 Aug 2021 David Venuto, Elaine Lau, Doina Precup, Ofir Nachum

Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challenges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments.

Offline RL

The Option Keyboard: Combining Skills in Reinforcement Learning

no code implementations NeurIPS 2019 André Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, Doina Precup

Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options.

Transfer Learning

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

no code implementations15 Jun 2021 Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

1 code implementation12 Jun 2021 Scott Fujimoto, David Meger, Doina Precup

We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.

Preferential Temporal Difference Learning

1 code implementation11 Jun 2021 Nishanth Anand, Doina Precup

When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states.

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

1 code implementation NeurIPS 2021 Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio

Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e. g., there are many ways to sequentially add atoms to generate some molecular graph.

Correcting Momentum in Temporal Difference Learning

1 code implementation7 Jun 2021 Emmanuel Bengio, Joelle Pineau, Doina Precup

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration.

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

1 code implementation NeurIPS 2021 Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning.

Model-based Reinforcement Learning

AndroidEnv: A Reinforcement Learning Platform for Android

2 code implementations27 May 2021 Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, Doina Precup

We introduce AndroidEnv, an open-source platform for Reinforcement Learning (RL) research built on top of the Android ecosystem.

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

no code implementations29 Apr 2021 Safa Alver, Doina Precup

Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm".

Meta Reinforcement Learning

Training a First-Order Theorem Prover from Synthetic Data

no code implementations5 Mar 2021 Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving

Variance Penalized On-Policy and Off-Policy Actor-Critic

1 code implementation3 Feb 2021 Arushi Jain, Gandharv Patil, Ayush Jain, Khimya Khetarpal, Doina Precup

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent.

Conditional Networks

no code implementations1 Jan 2021 Anthony Ortiz, Kris Sankaran, Olac Fuentes, Christopher Kiekintveld, Pascal Vincent, Yoshua Bengio, Doina Precup

In this work we tackle the problem of out-of-distribution generalization through conditional computation.

Image Classification Semantic Segmentation

Offline Policy Optimization with Variance Regularization

no code implementations1 Jan 2021 Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL

Practical Marginalized Importance Sampling with the Successor Representation

no code implementations1 Jan 2021 Scott Fujimoto, David Meger, Doina Precup

We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

no code implementations26 Dec 2020 Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi, Harsh Satija, Doina Precup

A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces.

Continuous Control

Towards Continual Reinforcement Learning: A Review and Perspectives

no code implementations25 Dec 2020 Khimya Khetarpal, Matthew Riemer, Irina Rish, Doina Precup

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL.

Continual Learning

On Efficiency in Hierarchical Reinforcement Learning

no code implementations NeurIPS 2020 Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Decision Making Hierarchical Reinforcement Learning

Gradient Starvation: A Learning Proclivity in Neural Networks

2 code implementations NeurIPS 2021 Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks.

Diversity-Enriched Option-Critic

1 code implementation4 Nov 2020 Anand Kamat, Doina Precup

We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin.

Continuous Control

Forethought and Hindsight in Credit Assignment

no code implementations NeurIPS 2020 Veronica Chelu, Doina Precup, Hado van Hasselt

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions.

Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning

no code implementations19 Oct 2020 Tianyu Li, Doina Precup, Guillaume Rabusseau

In this paper, we present connections between three models used in different research fields: weighted finite automata~(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks which encompasses a set of optimization techniques for high-order tensors used in quantum physics and numerical analysis.

Tensor Networks

A Fully Tensorized Recurrent Neural Network

1 code implementation8 Oct 2020 Charles C. Onu, Jacob E. Miller, Doina Precup

Recurrent neural networks (RNNs) are powerful tools for sequential modeling, but typically require significant overparameterization and regularization to achieve optimal performance.

Image Classification Speaker Verification

Reward Propagation Using Graph Convolutional Networks

1 code implementation NeurIPS 2020 Martin Klissarov, Doina Precup

Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning.

Graph Representation Learning

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

no code implementations20 Aug 2020 Sitao Luan, Mingde Zhao, Chenqing Hua, Xiao-Wen Chang, Doina Precup

The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information.

Graph Classification Node Classification

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks

no code implementations20 Aug 2020 Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

The performance limit of Graph Convolutional Networks (GCNs) and the fact that we cannot stack more of them to increase the performance, which we usually do for other deep learning paradigms, are pervasively thought to be caused by the limitations of the GCN layers, including insufficient expressive power, etc.

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

1 code implementation NeurIPS 2020 Scott Fujimoto, David Meger, Doina Precup

Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error.

What can I do here? A Theory of Affordances in Reinforcement Learning

no code implementations ICML 2020 Khimya Khetarpal, Zafarali Ahmed, Gheorghe Comanici, David Abel, Doina Precup

Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents.

Learning to Prove from Synthetic Theorems

no code implementations19 Jun 2020 Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving

A Brief Look at Generalization in Visual Meta-Reinforcement Learning

no code implementations ICML Workshop LifelongML 2020 Safa Alver, Doina Precup

Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training environments, there have been several studies that investigated the generalization performance of these algorithms.

Meta Reinforcement Learning

Gifting in multi-agent reinforcement learning

1 code implementation AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems 2020 Andrei Lupu, Doina Precup

Multi-agent reinforcement learning has generally been studied under an assumption inherited from classical reinforcement learning: that the reward function is the exclusive property of the environment, and is only altered by external factors.

Multi-agent Reinforcement Learning

Learning to cooperate: Emergent communication in multi-agent navigation

no code implementations2 Apr 2020 Ivana Kajić, Eser Aygün, Doina Precup

Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans.

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

no code implementations27 Mar 2020 Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.

Q-Learning

Interference and Generalization in Temporal Difference Learning

no code implementations ICML 2020 Emmanuel Bengio, Joelle Pineau, Doina Precup

We study the link between generalization and interference in temporal-difference (TD) learning.

Invariant Causal Prediction for Block MDPs

1 code implementation ICML 2020 Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.

Causal Inference Variable Selection

Policy Evaluation Networks

no code implementations26 Feb 2020 Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

Exploring Bayesian Deep Learning Uncertainty Measures for Segmentation of New Lesions in Longitudinal MRIs

no code implementations MIDL 2019 Nazanin Mohammadi Sepahvand, Raghav Mehta, Douglas Lorne Arnold, Doina Precup, Tal Arbel

In this paper, we develop a modified U-Net architecture to accurately segment new and enlarging lesions in longitudinal MRI, based on multi-modal MRI inputs, as well as subtrac- tion images between timepoints, in the context of large-scale clinical trial data for patients with Multiple Sclerosis (MS).

Options of Interest: Temporal Abstraction with Interest Functions

2 code implementations1 Jan 2020 Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.

Shaping representations through communication: community size effect in artificial learning systems

no code implementations12 Dec 2019 Olivier Tieleman, Angeliki Lazaridou, Shibl Mourad, Charles Blundell, Doina Precup

Motivated by theories of language and communication that explain why communities with large numbers of speakers have, on average, simpler languages with more regularity, we cast the representation learning problem in terms of learning to communicate.

Representation Learning

Marginalized State Distribution Entropy Regularization in Policy Optimization

no code implementations11 Dec 2019 Riashat Islam, Zafarali Ahmed, Doina Precup

Entropy regularization is used to get improved optimization performance in reinforcement learning tasks.

Continuous Control

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

no code implementations11 Dec 2019 Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.

Policy Gradient Methods

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

no code implementations11 Dec 2019 Riashat Islam, Raihan Seraj, Samin Yeasar Arnob, Doina Precup

Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.

Continuous Control Safe Reinforcement Learning

Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction

no code implementations28 Nov 2019 Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare

Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

text-based games

Option-Critic in Cooperative Multi-agent Systems

1 code implementation28 Nov 2019 Jhelum Chakravorty, Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, Doina Precup

In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999).

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

no code implementations12 Nov 2019 Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning.

Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments

1 code implementation29 Oct 2019 Martin Weiss, Simon Chamorro, Roger Girgis, Margaux Luck, Samira E. Kahou, Joseph P. Cohen, Derek Nowrouzezahrai, Doina Precup, Florian Golemo, Chris Pal

In our endeavor to create a navigation assistant for the BVI, we found that existing Reinforcement Learning (RL) environments were unsuitable for the task.

Actor Critic with Differentially Private Critic

no code implementations14 Oct 2019 Jonathan Lebensold, William Hamilton, Borja Balle, Doina Precup

Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e. g., via pre-training) on other related tasks.

Transfer Learning

Augmenting learning using symmetry in a biologically-inspired domain

no code implementations1 Oct 2019 Shruti Mishra, Abbas Abdolmaleki, Arthur Guez, Piotr Trochim, Doina Precup

Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations.

Data Augmentation Image Classification +1

Assessing Generalization in TD methods for Deep Reinforcement Learning

no code implementations25 Sep 2019 Emmanuel Bengio, Doina Precup, Joelle Pineau

Current Deep Reinforcement Learning (DRL) methods can exhibit both data inefficiency and brittleness, which seem to indicate that they generalize poorly.

Avoidance Learning Using Observational Reinforcement Learning

1 code implementation24 Sep 2019 David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup

We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator.

Imitation Learning

Revisit Policy Optimization in Matrix Form

no code implementations19 Sep 2019 Sitao Luan, Xiao-Wen Chang, Doina Precup

In tabular case, when the reward and environment dynamics are known, policy evaluation can be written as $\bm{V}_{\bm{\pi}} = (I - \gamma P_{\bm{\pi}})^{-1} \bm{r}_{\bm{\pi}}$, where $P_{\bm{\pi}}$ is the state transition matrix given policy ${\bm{\pi}}$ and $\bm{r}_{\bm{\pi}}$ is the reward signal given ${\bm{\pi}}$.

Model-based Reinforcement Learning

Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning

no code implementations5 Jul 2019 Srinivas Venkattaramanujam, Eric Crawford, Thang Doan, Doina Precup

Goal-conditioned policies are used in order to break down complex reinforcement learning (RL) problems by using subgoals, which can be defined either in state space or in a latent feature space.

Self-Supervised Learning

Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia

no code implementations24 Jun 2019 Charles C. Onu, Jonathan Lebensold, William L. Hamilton, Doina Precup

Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every year.

Transfer Learning

SVRG for Policy Evaluation with Fewer Gradient Evaluations

1 code implementation9 Jun 2019 Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy.

Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks

1 code implementation NeurIPS 2019 Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems.

Node Classification

Recurrent Value Functions

no code implementations23 May 2019 Pierre Thodoroff, Nishanth Anand, Lucas Caccia, Doina Precup, Joelle Pineau

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance.

Continuous Control

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

2 code implementations25 Apr 2019 Mingde Zhao, Sitao Luan, Ian Porada, Xiao-Wen Chang, Doina Precup

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.

Meta-Learning

Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

1 code implementation13 Mar 2019 Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger

Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene.

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

no code implementations4 Mar 2019 Hossein Aboutalebi, Doina Precup, Tibor Schuster

We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature.

The Termination Critic

no code implementations26 Feb 2019 Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

1 code implementation18 Dec 2018 Kian Kenyon-Dean, Andre Cianflone, Lucas Page-Caccia, Guillaume Rabusseau, Jackie Chi Kit Cheung, Doina Precup

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective.

General Classification Representation Learning

Off-Policy Deep Reinforcement Learning without Exploration

10 code implementations7 Dec 2018 Scott Fujimoto, David Meger, Doina Precup

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection.

Continuous Control

Environments for Lifelong Reinforcement Learning

2 code implementations26 Nov 2018 Khimya Khetarpal, Shagun Sodhani, Sarath Chandar, Doina Precup

To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific task but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned.

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Temporal Regularization in Markov Decision Process

2 code implementations1 Nov 2018 Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup

Several applications of Reinforcement Learning suffer from instability due to high variance.

Atari Games

Where Off-Policy Deep Reinforcement Learning Fails

no code implementations27 Sep 2018 Scott Fujimoto, David Meger, Doina Precup

This work examines batch reinforcement learning--the task of maximally exploiting a given batch of off-policy data, without further data collection.

Continuous Control

Combined Reinforcement Learning via Abstract Representations

1 code implementation12 Sep 2018 Vincent François-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages.

Transfer Learning

A Semi-Markov Chain Approach to Modeling Respiratory Patterns Prior to Extubation in Preterm Infants

no code implementations24 Aug 2018 Charles C. Onu, Lara J. Kanbar, Wissam Shalish, Karen A. Brown, Guilherme M. Sant'Anna, Robert E. Kearney, Doina Precup

After birth, extremely preterm infants often require specialized respiratory management in the form of invasive mechanical ventilation (IMV).

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation

1 code implementation3 Aug 2018 Tanya Nair, Doina Precup, Douglas L. Arnold, Tal Arbel

We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images.

Lesion Segmentation

Attend Before you Act: Leveraging human visual attention for continual learning

1 code implementation25 Jul 2018 Khimya Khetarpal, Doina Precup

When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant information and sequentially combining it to build a representation from the sensory data.

Continual Learning Decision Making +1

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

1 code implementation21 Jul 2018 Arushi Jain, Khimya Khetarpal, Doina Precup

We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency.

Atari Games Hierarchical Reinforcement Learning

Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning

no code implementations4 Jul 2018 Guillaume Rabusseau, Tianyu Li, Doina Precup

In this paper, we unravel a fundamental connection between weighted finite automata~(WFAs) and second-order recurrent neural networks~(2-RNNs): in the case of sequences of discrete symbols, WFAs and 2-RNNs with linear activation functions are expressively equivalent.

Dyna Planning using a Feature Based Generative Model

no code implementations23 May 2018 Ryan Faulkner, Doina Precup

Dyna-style reinforcement learning is a powerful approach for problems where not much real data is available.

Learning Safe Policies with Expert Guidance

no code implementations NeurIPS 2018 Jessie Huang, Fa Wu, Doina Precup, Yang Cai

We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify.

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Learnings Options End-to-End for Continuous Action Tasks

1 code implementation30 Nov 2017 Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).

Learning with Options that Terminate Off-Policy

no code implementations10 Nov 2017 Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

Deep Reinforcement Learning that Matters

6 code implementations19 Sep 2017 Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, David Meger

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL).

When Waiting is not an Option : Learning Options with a Deliberation Cost

1 code implementation14 Sep 2017 Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.

Atari Games

Neural Network Based Nonlinear Weighted Finite Automata

no code implementations13 Sep 2017 Tianyu Li, Guillaume Rabusseau, Doina Precup

Weighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models.

World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions

no code implementations EMNLP 2017 Teng Long, Emmanuel Bengio, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup

Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading comprehension systems that can do the same.

Language Modelling Reading Comprehension

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

1 code implementation10 Aug 2017 Riashat Islam, Peter Henderson, Maziar Gomrokchi, Doina Precup

We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results.

Continuous Control Policy Gradient Methods

Independently Controllable Factors

no code implementations3 Aug 2017 Valentin Thomas, Jules Pondard, Emmanuel Bengio, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation.

Variational Generative Stochastic Networks with Collaborative Shaping

1 code implementation2 Aug 2017 Philip Bachman, Doina Precup

We develop an approach to training generative models based on unrolling a variational auto-encoder into a Markov chain, and shaping the chain's trajectories using a technique inspired by recent work in Approximate Bayesian computation.

Convergent Tree Backup and Retrace with Function Approximation

no code implementations ICML 2018 Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

no code implementations18 Apr 2017 Jean Harb, Doina Precup

Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update.

Atari Games

Independently Controllable Features

no code implementations22 Mar 2017 Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio

Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images.

Multi-Timescale, Gradient Descent, Temporal Difference Learning with Linear Options

no code implementations19 Mar 2017 Peeyush Kumar, Doina Precup

Deliberating on large or continuous state spaces have been long standing challenges in reinforcement learning.

A Matrix Splitting Perspective on Planning with Options

no code implementations3 Dec 2016 Pierre-Luc Bacon, Doina Precup

We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.

The Option-Critic Architecture

9 code implementations16 Sep 2016 Pierre-Luc Bacon, Jean Harb, Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.

Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data

no code implementations ACL 2016 Teng Long, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup

Recent work in learning vector-space embeddings for multi-relational data has focused on combining relational information derived from knowledge bases with distributional information derived from large text corpora.

Entity Embeddings

Differentially Private Policy Evaluation

no code implementations7 Mar 2016 Borja Balle, Maziar Gomrokchi, Doina Precup

We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy.

Policy Gradient Methods for Off-policy Control

no code implementations13 Dec 2015 Lucas Lehnert, Doina Precup

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy.

Policy Gradient Methods

Basis refinement strategies for linear value function approximation in MDPs

no code implementations NeurIPS 2015 Gheorghe Comanici, Doina Precup, Prakash Panangaden

We provide a theoretical framework for analyzing basis function construction for linear value function approximation in Markov Decision Processes (MDPs).

Conditional Computation in Neural Networks for faster models

1 code implementation19 Nov 2015 Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.

Data Generation as Sequential Decision Making

1 code implementation NeurIPS 2015 Philip Bachman, Doina Precup

We connect a broad class of generative models through their shared reliance on sequential decision making.

Decision Making Imputation

Learning with Pseudo-Ensembles

no code implementations NeurIPS 2014 Philip Bachman, Ouais Alsharif, Doina Precup

We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process.

Sentiment Analysis

Optimizing Energy Production Using Policy Search and Predictive State Representations

no code implementations NeurIPS 2014 Yuri Grinberg, Doina Precup, Michel Gendreau

We consider the challenging practical problem of optimizing the power production of a complex of hydroelectric power plants, which involves control over three continuous action variables, uncertainty in the amount of water inflows and a variety of constraints that need to be satisfied.

Practical Kernel-Based Reinforcement Learning

no code implementations21 Jul 2014 André M. S. Barreto, Doina Precup, Joelle Pineau

In this paper we introduce an algorithm that turns KBRL into a practical reinforcement learning tool.

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

no code implementations2 Jul 2014 Amir-Massoud Farahmand, Doina Precup, André M. S. Barreto, Mohammad Ghavamzadeh

We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the policy space, depending on what is advantageous.

General Classification

Iterative Multilevel MRF Leveraging Context and Voxel Information for Brain Tumour Segmentation in MRI

no code implementations CVPR 2014 Nagesh Subbanna, Doina Precup, Tal Arbel

In this paper, we introduce a fully automated multistage graphical probabilistic framework to segment brain tumours from multimodal Magnetic Resonance Images (MRIs) acquired from real patients.

Tumour Classification

Algorithms for multi-armed bandit problems

no code implementations25 Feb 2014 Volodymyr Kuleshov, Doina Precup

Although the design of clinical trials has been one of the principal practical problems motivating research on multi-armed bandits, bandit algorithms have never been evaluated as potential treatment allocation strategies.

Multi-Armed Bandits

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

no code implementations NeurIPS 2013 Mahdi Milani Fard, Yuri Grinberg, Amir-Massoud Farahmand, Joelle Pineau, Doina Precup

This paper addresses the problem of automatic generation of features for value function approximation in reinforcement learning.

Value Pursuit Iteration

no code implementations NeurIPS 2012 Amir M. Farahmand, Doina Precup

VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features.

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

no code implementations NeurIPS 2012 Doina Precup, Joelle Pineau, Andre S. Barreto

The ability to learn a policy for a sequential decision problem with continuous state space using on-line data is a long-standing challenge.

Reinforcement Learning using Kernel-Based Stochastic Factorization

no code implementations NeurIPS 2011 Andre S. Barreto, Doina Precup, Joelle Pineau

Kernel-based reinforcement-learning (KBRL) is a method for learning a decision policy from a set of sample transitions which stands out for its strong theoretical guarantees.

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations NeurIPS 2009 Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.

Q-Learning

Bounding Performance Loss in Approximate MDP Homomorphisms

no code implementations NeurIPS 2008 Jonathan Taylor, Doina Precup, Prakash Panagaden

We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than that provided by bisimulation metrics (Ferns et al. 2004, 2005).

Cannot find the paper you are looking for? You can Submit a new open access paper.