no code implementations • ICML 2020 • Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
Under the limited bandwidth constraint, a communication protocol is required to generate informative messages.
no code implementations • 18 Nov 2019 • Runsheng Yu, Zhenyu Shi, Xinrun Wang, Rundong Wang, Buhong Liu, Xinwen Hou, Hanjiang Lai, Bo An
Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme, where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently.
no code implementations • ICLR 2020 • Zhenyu Shi*, Runsheng Yu*, Xinrun Wang*, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An
The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers' behaviors when assigning bonuses and ii) the complex interactions between followers make the training process hard to converge, especially when the leader's policy changes with time.
no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Rundong Wang, Xinrun Wang, Runsheng Yu, Xin Li, Zhirong Wang
Thus, the global policy of the whole page could be sub-optimal.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 9 Dec 2020 • Hongxin Wei, Lei Feng, Rundong Wang, Bo An
Deep neural networks have been shown to easily overfit to biased training data with label noise or class imbalance.
no code implementations • 23 Dec 2020 • Rundong Wang, Hongxin Wei, Bo An, Zhouyan Feng, Jun Yao
Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error.
no code implementations • 1 Jan 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Centralized training with decentralized execution (CTDE) has become an important paradigm in multi-agent reinforcement learning (MARL).
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • 8 Jan 2021 • Runsheng Yu, Yu Gong, Rundong Wang, Bo An, Qingwen Liu, Wenwu Ou
Firstly, we introduce a novel training scheme with two value functions to maximize the accumulated long-term reward under the safety constraint.
no code implementations • 16 Feb 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • 28 Sep 2021 • Shuo Sun, Rundong Wang, Bo An
RL's impact is pervasive, recently demonstrating its ability to conquer many challenging QT tasks.
no code implementations • 29 Sep 2021 • Zhuoyi Lin, Biao Ye, Xu He, Shuo Sun, Rundong Wang, Rui Yin, Xu Chi, Chee Keong Kwoh
A machine learning system is typically composed of model and data.
no code implementations • NeurIPS 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Rundong Wang, Xu He, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • 15 Dec 2021 • Shuo Sun, Wanqi Xue, Rundong Wang, Xu He, Junlei Zhu, Jian Li, Bo An
Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading.
no code implementations • 14 Jan 2022 • Zhuoyi Lin, Sheng Zang, Rundong Wang, Zhu Sun, J. Senthilnath, Chi Xu, Chee-Keong Kwoh
We then introduce a dynamic transformer encoder (DTE) to capture user-specific inter-item relationships among item candidates by seamlessly accommodating the learned latent user intentions via IDM.
no code implementations • 27 May 2022 • Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan
During execution durations, the environment changes are influenced by, but not synchronised with, action execution.
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • 7 Jun 2022 • Shuo Sun, Rundong Wang, Bo An
To tackle these two limitations, we first reformulate quantitative investment as a multi-task learning problem.
no code implementations • 7 Feb 2023 • Rundong Wang, Longtao Zheng, Wei Qiu, Bowei He, Bo An, Zinovi Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan
Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 23 Apr 2023 • Yiming Gao, Feiyu Liu, Liang Wang, Zhenjie Lian, Weixuan Wang, Siqin Li, Xianliang Wang, Xianhan Zeng, Rundong Wang, Jiawei Wang, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu
MOBA games, e. g., Dota2 and Honor of Kings, have been actively used as the testbed for the recent AI research on games, and various AI systems have been developed at the human level so far.
1 code implementation • 13 Jun 2023 • Longtao Zheng, Rundong Wang, Xinrun Wang, Bo An
To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks.