no code implementations • 10 Sep 2024 • Yuya Fujisaki, Shiro Takagi, Hideki Asoh, Wataru Kumagai
We expect our dataset provides a foundation for further research on developing better evaluation functions tailored to the RQ extraction task, and contribute to enhance the performance of the task.
1 code implementation • 29 Aug 2024 • Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo
We first prove that the conventional Lagrangian max-min formulation with policy gradient methods can become trapped in suboptimal solutions by encountering a sum of conflicting gradients from the objective and constraint functions during its inner minimization problem.
1 code implementation • 31 Jan 2024 • Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs).
no code implementations • 16 Nov 2023 • Shiro Takagi, Ryutaro Yamauchi, Wataru Kumagai
Research automation efforts usually employ AI as a tool to automate specific tasks within the research process.
no code implementations • 21 Sep 2023 • Ryutaro Yamauchi, Sho Sonoda, Akiyoshi Sannai, Wataru Kumagai
In this paper, we propose a novel framework that integrates the Chain-of-Thought (CoT) method with an external tool (Python REPL).
1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.
1 code implementation • 15 Sep 2022 • Shohei Taniguchi, Yusuke Iwasawa, Wataru Kumagai, Yutaka Matsuo
Based on the ALD, we also present a new deep latent variable model named the Langevin autoencoder (LAE).
no code implementations • 15 Oct 2021 • Akiyoshi Sannai, Makoto Kawano, Wataru Kumagai
We construct learning models based on the reductive Reynolds operator called equivariant and invariant Reynolds networks (ReyNets) and prove that they have universal approximation property.
no code implementations • 29 Sep 2021 • Akiyoshi Sannai, Makoto Kawano, Wataru Kumagai
To overcome this difficulty, we consider representing the Reynolds operator as a sum over a subset instead of a sum over the whole group.
no code implementations • ICLR 2021 • Makoto Kawano, Wataru Kumagai, Akiyoshi Sannai, Yusuke Iwasawa, Yutaka Matsuo
We present the group equivariant conditional neural process (EquivCNP), a meta-learning method with permutation invariance in a data set as in conventional conditional neural processes (CNPs), and it also has transformation equivariance in data space.
no code implementations • 1 Jan 2021 • Yuki Mae, Wataru Kumagai, Takafumi Kanamori
We report the computational efficiency and statistical reliability of our method in numerical experiments of the language modeling using RNNs, and the out-of-distribution detection with DNNs.
no code implementations • 27 Dec 2020 • Wataru Kumagai, Akiyoshi Sannai
However, universal approximation theorems for CNNs have been separately derived with individual techniques according to each group and setting.
no code implementations • 2 Jun 2018 • Kota Matsui, Wataru Kumagai, Kenta Kanamori, Mitsuaki Nishikimi, Takafumi Kanamori
In this paper, we propose a variable selection method for general nonparametric kernel-based estimation.
no code implementations • NeurIPS 2017 • Wataru Kumagai
The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions.
no code implementations • NeurIPS 2016 • Wataru Kumagai
We consider a transfer-learning problem by using the parameter transfer approach, where a suitable parameter of feature mapping is learned through one task and applied to another objective task.
no code implementations • 13 Sep 2014 • Kota Matsui, Wataru Kumagai, Takafumi Kanamori
Our algorithm consists of two steps; one is the direction estimate step and the other is the search step.