no code implementations • 12 Nov 2024 • Motoki Omura, Yasuhiro Fujita, Toshiki Kataoka
In the post-training of large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) is an effective approach to achieve generation aligned with human preferences.
no code implementations • 10 Oct 2024 • Preferred Elements, :, Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai, Kosuke Nakago, Daisuke Nishino, Toru Ogawa, Daisuke Okanohara, Yoshihiko Ozaki, Shotaro Sano, Shuji Suzuki, Tianqi Xu, Toshihiko Yanase
We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency.
1 code implementation • 9 Dec 2019 • Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa
In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework.
38 code implementations • ICLR 2018 • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
One of the challenges in the study of generative adversarial networks is the instability of its training.
Ranked #26 on Image Generation on STL-10