Search Results for author: Tadashi Kozuno

Found 23 papers, 7 papers with code

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

no code implementations28 Feb 2024 Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya

This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one.

A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

1 code implementation31 Jan 2024 Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo

We study a primal-dual reinforcement learning (RL) algorithm for the online constrained Markov decision processes (CMDP) problem, wherein the agent explores an optimal policy that maximizes return while satisfying constraints.

Reinforcement Learning (RL)

Local and adaptive mirror descents in extensive-form games

no code implementations1 Sep 2023 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations29 May 2023 Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Robust Markov Decision Processes without Model Estimation

no code implementations2 Feb 2023 Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Moreover, we prove the alternative form still plays a similar role as the original form.

Adapting to game trees in zero-sum imperfect information games

1 code implementation23 Dec 2022 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations27 Oct 2022 Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

Learning in two-player zero-sum partially observable Markov games with perfect recall

no code implementations NeurIPS 2021 Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations11 Jun 2021 Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

1 code implementation NeurIPS 2021 Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu

These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations27 Feb 2021 Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations31 Mar 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Reinforcement Learning (RL)

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

no code implementations18 Jun 2019 Tadashi Kozuno, Dongqi Han, Kenji Doya

We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning.

reinforcement-learning Reinforcement Learning (RL)

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

no code implementations30 Oct 2017 Tadashi Kozuno, Eiji Uchibe, Kenji Doya

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.