1 code implementation • 30 Sep 2023 • Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations.
no code implementations • 15 Mar 2023 • Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar
This is challenging for deep-learning-based world models due to catastrophic forgetting.
Model-based Reinforcement Learning reinforcement-learning +1
2 code implementations • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.
1 code implementation • ICLR 2022 • Jorge A. Mendez, Harm van Seijen, Eric Eaton
Empirically, we demonstrate that neural composition indeed captures the underlying structure of this space of problems.
1 code implementation • 25 Apr 2022 • Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 9 Mar 2022 • Nathaniel Weir, Xingdi Yuan, Marc-Alexandre Côté, Matthew Hausknecht, Romain Laroche, Ida Momennejad, Harm van Seijen, Benjamin Van Durme
Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration.
1 code implementation • 13 Jul 2021 • Sungryull Sohn, Sungtae Lee, Jongwook Choi, Harm van Seijen, Mehdi Fatemi, Honglak Lee
We propose the k-Shortest-Path (k-SP) constraint: a novel constraint on the agent's trajectory that improves the sample efficiency in sparse-reward MDPs.
no code implementations • ICLR 2021 • Faruk Ahmed, Yoshua Bengio, Harm van Seijen, Aaron Courville
We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently-correlating complex features.
1 code implementation • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes
In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.
2 code implementations • NeurIPS 2020 • Harm van Seijen, Hadi Nekoei, Evan Racah, Sarath Chandar
For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL.
Model-based Reinforcement Learning Reinforcement Learning (RL) +1
2 code implementations • NeurIPS 2019 • Harm van Seijen, Mehdi Fatemi, Arash Tavakoli
In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation.
1 code implementation • 7 Sep 2018 • Remi Tachet, Philip Bachman, Harm van Seijen
While recent progress has spawned very powerful machine learning systems, those agents remain extremely specialized and fail to transfer the knowledge they gain to similar yet unseen tasks.
1 code implementation • NeurIPS 2017 • Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
One of the main challenges in reinforcement learning (RL) is generalisation.
no code implementations • ICLR 2018 • Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen
We consider tackling a single-agent RL problem by distributing it to $n$ learners.
no code implementations • 15 Dec 2016 • Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche
In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task.
no code implementations • 18 Aug 2016 • Harm van Seijen
Furthermore, based on our analysis, we propose a new multi-step TD method for non-linear function approximation that addresses this issue.
1 code implementation • 13 Dec 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton
Our results suggest that the true online methods indeed dominate the regular methods.
no code implementations • 1 Jul 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton
Our results confirm the strength of true online TD({\lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({\lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({\lambda}), 2) across all domains/representations the learning speed of true online TD({\lambda}) is often better, but never worse than that of TD({\lambda}), and 3) true online TD({\lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size.