no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto
This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).
1 code implementation • 31 Jan 2023 • Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto
We investigate policy transfer using image-to-semantics translation to mitigate learning difficulties in vision-based robotics control agents.
1 code implementation • 7 Nov 2022 • Takumi Tanabe, Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto
In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set.
1 code implementation • 11 Dec 2020 • Rei Sato, Jun Sakuma, Youhei Akimoto
In this paper, we propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS, which further reduces the time complexity of NAS by reducing the number of search iterations.
1 code implementation • 30 Aug 2019 • Rei Sato, Tetsuro Nikuni, Shohei Watabe
We investigate a quantum spatial search problem on fractal lattices, such as Sierpinski carpets and Menger sponges.
Quantum Physics