Search Results for author: Harm van Seijen

Found 18 papers, 11 papers with code

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

1 code implementation30 Sep 2023 Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations.

Decision Making Model-based Reinforcement Learning +2

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

2 code implementations31 Oct 2022 Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.

Offline RL Reinforcement Learning (RL) +1

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

1 code implementation25 Apr 2022 Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen

We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes.

Model-based Reinforcement Learning reinforcement-learning +1

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

1 code implementation13 Jul 2021 Sungryull Sohn, Sungtae Lee, Jongwook Choi, Harm van Seijen, Mehdi Fatemi, Honglak Lee

We propose the k-Shortest-Path (k-SP) constraint: a novel constraint on the agent's trajectory that improves the sample efficiency in sparse-reward MDPs.

Continuous Control reinforcement-learning +1

Systematic generalisation with group invariant predictions

no code implementations ICLR 2021 Faruk Ahmed, Yoshua Bengio, Harm van Seijen, Aaron Courville

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently-correlating complex features.

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

2 code implementations NeurIPS 2020 Harm van Seijen, Hadi Nekoei, Evan Racah, Sarath Chandar

For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL.

Model-based Reinforcement Learning Reinforcement Learning (RL) +1

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

2 code implementations NeurIPS 2019 Harm van Seijen, Mehdi Fatemi, Arash Tavakoli

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation.

General Reinforcement Learning reinforcement-learning +1

Learning Invariances for Policy Generalization

1 code implementation7 Sep 2018 Remi Tachet, Philip Bachman, Harm van Seijen

While recent progress has spawned very powerful machine learning systems, those agents remain extremely specialized and fail to transfer the knowledge they gain to similar yet unseen tasks.

BIG-bench Machine Learning Data Augmentation +3

Separation of Concerns in Reinforcement Learning

no code implementations15 Dec 2016 Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche

In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task.

reinforcement-learning Reinforcement Learning (RL)

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

no code implementations18 Aug 2016 Harm van Seijen

Furthermore, based on our analysis, we propose a new multi-step TD method for non-linear function approximation that addresses this issue.

True Online Temporal-Difference Learning

1 code implementation13 Dec 2015 Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

Our results suggest that the true online methods indeed dominate the regular methods.

Atari Games

An Empirical Evaluation of True Online TD(λ)

no code implementations1 Jul 2015 Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

Our results confirm the strength of true online TD({\lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({\lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({\lambda}), 2) across all domains/representations the learning speed of true online TD({\lambda}) is often better, but never worse than that of TD({\lambda}), and 3) true online TD({\lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size.

Cannot find the paper you are looking for? You can Submit a new open access paper.