Search Results for author: Eiji Uchibe

Found 11 papers, 0 papers with code

Reward-Punishment Reinforcement Learning with Maximum Entropy

no code implementations20 May 2024 Jiexin Wang, Eiji Uchibe

We introduce the ``soft Deep MaxPain'' (softDMP) algorithm, which integrates the optimization of long-term policy entropy into reward-punishment reinforcement learning objectives.


Randomized-to-Canonical Model Predictive Control for Real-world Visual Robotic Manipulation

no code implementations5 Jul 2022 Tomoya Yamanokuchi, Yuhwan Kwon, Yoshihisa Tsurumine, Eiji Uchibe, Jun Morimoto, Takamitsu Matsubara

However, such works are limited to one-shot transfer, where real-world data must be collected once to perform the sim-to-real transfer, which remains a significant human effort in transferring the models learned in simulations to new domains in the real world.

Model Predictive Control

Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

no code implementations21 Jun 2022 Eiji Uchibe

We derive structured discriminators so that the learning of the policy and the model is efficient.

counterfactual Imitation Learning +3

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

no code implementations16 May 2022 Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara

Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy.

reinforcement-learning Reinforcement Learning (RL)

$q$-Munchausen Reinforcement Learning

no code implementations16 May 2022 Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara

The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy.

reinforcement-learning Reinforcement Learning (RL)

Unbounded Output Networks for Classification

no code implementations25 Jul 2018 Stefan Elfwing, Eiji Uchibe, Kenji Doya

In this study, by adopting features of the EE-RBM approach to feed-forward neural networks, we propose the UnBounded output network (UBnet) which is characterized by three features: (1) unbounded output units; (2) the target value of correct classification is set to a value much greater than one; and (3) the models are trained by a modified mean-squared error objective.

Classification General Classification

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

no code implementations30 Oct 2017 Tadashi Kozuno, Eiji Uchibe, Kenji Doya

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.

reinforcement-learning Reinforcement Learning (RL)

Online Meta-learning by Parallel Algorithm Competition

no code implementations24 Feb 2017 Stefan Elfwing, Eiji Uchibe, Kenji Doya

In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters.

Atari Games Meta-Learning +3

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

no code implementations10 Feb 2017 Stefan Elfwing, Eiji Uchibe, Kenji Doya

First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU).

Atari Games reinforcement-learning +1

A Generalized Natural Actor-Critic Algorithm

no code implementations NeurIPS 2009 Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya

In this paper, we describe a generalized Natural Gradient (gNG) by linearly interpolating the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, generalized Natural Actor-Critic (gNAC).

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.