no code implementations • 22 Mar 2023 • Hirotaka Tahara, Hikaru Sasaki, Hanbit Oh, Edgar Anarossi, Takamitsu Matsubara
Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching).
no code implementations • 27 Jan 2023 • Lingwei Zhu, Zheng Chen, Takamitsu Matsubara, Martha White
Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly.
no code implementations • 21 Sep 2022 • Yuhwan Kwon, Yoshihisa Tsurumine, Takeshi Shimmura, Sadao Kawamura, Takamitsu Matsubara
To cope with this problem, we propose Physically Consistent Preferential Bayesian Optimization (PCPBO) as a method that obtains physically feasible and preferred arrangements that satisfy domain rules.
no code implementations • 29 Jul 2022 • Yuki Kadokawa, Lingwei Zhu, Yoshihisa Tsurumine, Takamitsu Matsubara
Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting.
no code implementations • 5 Jul 2022 • Tomoya Yamanokuchi, Yuhwan Kwon, Yoshihisa Tsurumine, Eiji Uchibe, Jun Morimoto, Takamitsu Matsubara
However, such works are limited to one-shot transfer, where real-world data must be collected once to perform the sim-to-real transfer, which remains a significant human effort in transferring the models learned in simulations to new domains in the real world.
no code implementations • 16 May 2022 • Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara
The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy.
no code implementations • 16 May 2022 • Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara
Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy.
no code implementations • 18 Jan 2022 • Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Takamitsu Matsubara
As a safety net, it is natural to improve the robustness to noise of the optimization algorithm which updates the network parameters in the final process of learning.
no code implementations • 16 Jul 2021 • Toshinori Kitamura, Lingwei Zhu, Takamitsu Matsubara
The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions.
no code implementations • 13 Jul 2021 • Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara
In this paper, we propose cautious policy programming (CPP), a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning.
no code implementations • 12 Jul 2021 • Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara
The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better.
no code implementations • 29 Mar 2021 • Kazuki Shibata, Tomohiko Jimbo, Takamitsu Matsubara
In this paper, we explore a multi-agent reinforcement learning approach to address the design problem of communication and control strategies for multi-agent cooperative transport.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 16 Oct 2020 • Cheng-Yu Kuo, Andreas Schaarschmidt, Yunduan Cui, Tamim Asfour, Takamitsu Matsubara
In typical MBRL, we cannot expect the data-driven model to generate accurate and reliable policies to the intended robotic tasks during the learning process due to sample scarcity.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 25 Aug 2020 • Lingwei Zhu, Takamitsu Matsubara
We propose a novel reinforcement learning algorithm that exploits this lower-bound as a criterion for adjusting the degree of a policy update for alleviating policy oscillation.
no code implementations • 2 Mar 2020 • Cristian Camilo Beltran-Hernandez, Damien Petit, Ixchel G. Ramirez-Alpizar, Takayuki Nishi, Shinichi Kikuchi, Takamitsu Matsubara, Kensuke Harada
Thirdly, we developed a fail-safe mechanism for safely training an RL agent on manipulation tasks using a real rigid robot manipulator.