Search Results for author: Toshiki Kataoka

Found 4 papers, 2 papers with code

Entropy Controllable Direct Preference Optimization

no code implementations12 Nov 2024 Motoki Omura, Yasuhiro Fujita, Toshiki Kataoka

In the post-training of large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) is an effective approach to achieve generation aligned with human preferences.

ChainerRL: A Deep Reinforcement Learning Library

1 code implementation9 Dec 2019 Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa

In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework.

Deep Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.