no code implementations • 26 Oct 2024 • Yuting Tang, Xin-Qiang Cai, Jing-Cheng Pang, Qiyu Wu, Yao-Xiang Ding, Masashi Sugiyama
In this paper, we introduce the problem of RL from Composite Delayed Reward (RLCoDe), which generalizes traditional RL from delayed rewards by eliminating the strong assumption.
no code implementations • 14 Apr 2024 • Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.
no code implementations • 6 Feb 2024 • Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu
The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language.
no code implementations • 23 May 2023 • Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
no code implementations • 18 Feb 2023 • Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu
To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique.
no code implementations • 18 May 2021 • Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu
To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.
1 code implementation • NeurIPS 2021 • Xu-Hui Liu, Zhenghai Xue, Jing-Cheng Pang, Shengyi Jiang, Feng Xu, Yang Yu
In reinforcement learning, experience replay stores past samples for further reuse.
no code implementations • 27 Nov 2019 • Rong-Jun Qin, Jing-Cheng Pang, Yang Yu
However, learning to beat a pool in stochastic games, i. e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples.