Search Results for author: Takumi Tanabe

Found 3 papers, 2 papers with code

Stepwise Alignment for Constrained Language Model Policy Optimization

no code implementations17 Apr 2024 Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto

This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).

Computational Efficiency Language Modelling

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

1 code implementation7 Nov 2022 Takumi Tanabe, Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set.

Level Generation for Angry Birds with Sequential VAE and Latent Variable Evolution

1 code implementation13 Apr 2021 Takumi Tanabe, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

When ML techniques are applied to game domains with non-tile-based level representation, such as Angry Birds, where objects in a level are specified by real-valued parameters, ML often fails to generate playable levels.

Cannot find the paper you are looking for? You can Submit a new open access paper.