no code implementations • 11 Aug 2022 • Ke Xu, Jianqiao Wangni, Yifan Zhang, Deheng Ye, Jiaxiang Wu, Peilin Zhao
Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model.
no code implementations • 12 May 2022 • Qianggang Ding, Deheng Ye, Tingyang Xu, Peilin Zhao
To the best of our knowledge, our method is the first GNN-based bilevel optimization framework for resolving this task.
no code implementations • 17 Feb 2022 • Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman
With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.
no code implementations • 7 Dec 2021 • Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang
To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Efficient Exploration
Hierarchical Reinforcement Learning
+3
1 code implementation • NeurIPS 2021 • Zifan Wu, Chao Yu, Deheng Ye, Junge Zhang, Haiyin Piao, Hankz Hankui Zhuo
We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.
no code implementations • NeurIPS 2021 • Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, Guoan Han, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, Lanxiao Huang
Recently, many researchers have made successful progress in building the AI systems for MOBA-game-playing with deep reinforcement learning, such as on Dota 2 and Honor of Kings.
1 code implementation • 9 Oct 2021 • Shiyu Huang, Wenze Chen, Longfei Zhang, Shizhen Xu, Ziyang Li, Fengming Zhu, Deheng Ye, Ting Chen, Jun Zhu
To the best of our knowledge, Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios.
no code implementations • 19 Jun 2021 • Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li
While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected.
1 code implementation • 13 May 2021 • Menghui Zhu, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang
In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem.
no code implementations • 5 Jan 2021 • Jiamou Sun, Zhenchang Xing, Hao Guo, Deheng Ye, Xiaohong Li, Xiwei Xu, Liming Zhu
The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs.
no code implementations • 18 Dec 2020 • Sheng Chen, Menghui Zhu, Deheng Ye, Weinan Zhang, Qiang Fu, Wei Yang
Hero drafting is essential in MOBA game playing as it builds the team of each side and directly affects the match outcome.
no code implementations • NeurIPS 2020 • Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu
However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i. e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes.
no code implementations • 25 Nov 2020 • Deheng Ye, Guibin Chen, Peilin Zhao, Fuhao Qiu, Bo Yuan, Wen Zhang, Sheng Chen, Mingfei Sun, Xiaoqian Li, Siqin Li, Jing Liang, Zhenjie Lian, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang
Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner.
2 code implementations • IJCAI 2020 • Ke Xu, Yifan Zhang, Deheng Ye, Peilin Zhao, Mingkui Tan
One of the key issues is how to represent the non-stationary price series of assets in a portfolio, which is important for portfolio decisions.
no code implementations • 20 Dec 2019 • Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, Qiaobo Chen, Yinyuting Yin, Hao Zhang, Tengfei Shi, Liang Wang, Qiang Fu, Wei Yang, Lanxiao Huang
We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games.