Search Results for author: Tadashi Kozuno

Found 23 papers, 7 papers with code

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

no code implementations • 28 Feb 2024 • Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya

This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one.

Paper
Add Code

A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

1 code implementation • 31 Jan 2024 • Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo

We study a primal-dual reinforcement learning (RL) algorithm for the online constrained Markov decision processes (CMDP) problem, wherein the agent explores an optimal policy that maximizes return while satisfying constraints.

Reinforcement Learning (RL)

Paper
Code

Local and adaptive mirror descents in extensive-form games

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Paper
Add Code

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

1 code implementation • 18 Apr 2023 • Kazumi Kasaura, Shuwa Miura, Tadashi Kozuno, Ryo Yonetani, Kenta Hoshino, Yohei Hosoe

This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms.

Benchmarking Reinforcement Learning (RL)

Paper
Code

Robust Markov Decision Processes without Model Estimation

no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Moreover, we prove the alternative form still plays a similar role as the original form.

Paper
Add Code

Adapting to game trees in zero-sum imperfect information games

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Paper
Code

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations • 27 Oct 2022 • Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

no code implementations • 18 May 2022 • Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters.

Reinforcement Learning (RL)

Paper
Add Code

Learning in two-player zero-sum partially observable Markov games with perfect recall

no code implementations • NeurIPS 2021 • Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

Paper
Add Code

Variational oracle guiding for reinforcement learning

no code implementations • ICLR 2022 • Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang, Dongsheng Li

How to make intelligent decisions is a central problem in machine learning and cognitive science.

Decision Making Offline RL +2

Paper
Add Code

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations • 17 Jul 2021 • Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Paper
Add Code

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.

Meta Reinforcement Learning Off-policy evaluation +1

Paper
Code

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Paper
Add Code

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

1 code implementation • NeurIPS 2021 • Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu

These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning

1 code implementation • 23 Mar 2021 • Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu

Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments.

Continuous Control OpenAI Gym +2

Paper
Code

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

Paper
Add Code

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

no code implementations • NeurIPS 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations • 31 Mar 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Reinforcement Learning (RL)

Paper
Add Code

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

no code implementations • 18 Jun 2019 • Tadashi Kozuno, Dongqi Han, Kenji Doya

We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

no code implementations • 30 Oct 2017 • Tadashi Kozuno, Eiji Uchibe, Kenji Doya

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.