Search Results for author: Tadashi Kozuno

Found 10 papers, 2 papers with code

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations11 Jun 2021 Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

1 code implementation31 Mar 2021 Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu

These results show which implementation details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations27 Feb 2021 Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

no code implementations NeurIPS 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations31 Mar 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

no code implementations18 Jun 2019 Tadashi Kozuno, Dongqi Han, Kenji Doya

We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning.

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

no code implementations30 Oct 2017 Tadashi Kozuno, Eiji Uchibe, Kenji Doya

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.