Search Results for author: Toshinori Kitamura

Found 7 papers, 3 papers with code

A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

1 code implementation31 Jan 2024 Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo

We study a primal-dual reinforcement learning (RL) algorithm for the online constrained Markov decision processes (CMDP) problem, wherein the agent explores an optimal policy that maximizes return while satisfying constraints.

Reinforcement Learning (RL)

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

2 code implementations8 Dec 2021 Toshinori Kitamura, Ryo Yonetani

We present ShinRL, an open-source library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives.

Q-Learning Reinforcement Learning (RL)

Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

no code implementations16 Jul 2021 Toshinori Kitamura, Lingwei Zhu, Takamitsu Matsubara

The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions.

reinforcement-learning Reinforcement Learning (RL)

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

no code implementations13 Jul 2021 Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara

In this paper, we propose cautious policy programming (CPP), a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning.

Atari Games reinforcement-learning +1

Cautious Actor-Critic

no code implementations12 Jul 2021 Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better.

Continuous Control

Cannot find the paper you are looking for? You can Submit a new open access paper.