Search Results for author: Scott Fujimoto

Found 17 papers, 10 papers with code

Imitation Learning from Observation through Optimal Transport

no code implementations2 Oct 2023 Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek

Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions.

Continuous Control Imitation Learning

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

2 code implementations NeurIPS 2023 Scott Fujimoto, Wei-Di Chang, Edward J. Smith, Shixiang Shane Gu, Doina Precup, David Meger

In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems.

Continuous Control OpenAI Gym +3

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

no code implementations28 Jan 2022 Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy.

Value prediction

Why Should I Trust You, Bellman? Evaluating the Bellman Objective with Off-Policy Data

no code implementations29 Sep 2021 Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu

In this work, we analyze the effectiveness of the Bellman equation as a proxy objective for value prediction accuracy in off-policy evaluation.

Off-policy evaluation Value prediction

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

1 code implementation12 Jun 2021 Scott Fujimoto, David Meger, Doina Precup

We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.

Off-policy evaluation reinforcement-learning

Practical Marginalized Importance Sampling with the Successor Representation

no code implementations1 Jan 2021 Scott Fujimoto, David Meger, Doina Precup

We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.

Off-policy evaluation reinforcement-learning

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

1 code implementation NeurIPS 2020 Scott Fujimoto, David Meger, Doina Precup

Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error.

reinforcement-learning Reinforcement Learning (RL)

Benchmarking Batch Deep Reinforcement Learning Algorithms

4 code implementations3 Oct 2019 Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment.

Benchmarking Q-Learning +2

Off-Policy Deep Reinforcement Learning without Exploration

10 code implementations7 Dec 2018 Scott Fujimoto, David Meger, Doina Precup

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection.

Continuous Control reinforcement-learning +1

Where Off-Policy Deep Reinforcement Learning Fails

no code implementations27 Sep 2018 Scott Fujimoto, David Meger, Doina Precup

This work examines batch reinforcement learning--the task of maximally exploiting a given batch of off-policy data, without further data collection.

Continuous Control reinforcement-learning +1

Addressing Function Approximation Error in Actor-Critic Methods

67 code implementations ICML 2018 Scott Fujimoto, Herke van Hoof, David Meger

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

Continuous Control OpenAI Gym +3

Cannot find the paper you are looking for? You can Submit a new open access paper.