no code implementations • 8 Mar 2024 • Jingxiao Chen, Ziqin Gong, Minghuan Liu, Jun Wang, Yong Yu, Weinan Zhang
To overcome this problem and to have an effective solution against hard constraints, we proposed a novel learning-based method that uses looking-ahead information as the feature to improve the legality of TSP with Time Windows (TSPTW) solutions.
no code implementations • 29 Feb 2024 • Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong Yu, Ying Wen
Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games.
no code implementations • 8 Oct 2023 • Xihuai Wang, Shao Zhang, WenHao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang
Current evaluation methods for ZSC capability still need to improve in constructing diverse evaluation partners and comprehensively measuring the ZSC capability.
1 code implementation • 24 Dec 2022 • Ying Wen, Ziyu Wan, Ming Zhou, Shufang Hou, Zhe Cao, Chenyang Le, Jingxiao Chen, Zheng Tian, Weinan Zhang, Jun Wang
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems.
1 code implementation • 18 Sep 2022 • Hua Wei, Jingxiao Chen, Xiyang Ji, Hongyang Qin, Minwen Deng, Siqin Li, Liang Wang, Weinan Zhang, Yong Yu, Lin Liu, Lanxiao Huang, Deheng Ye, Qiang Fu, Wei Yang
Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning.
no code implementations • 28 Jan 2022 • Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu, Jun Wang
Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch.
1 code implementation • ACL 2021 • Runzhe Yang, Jingxiao Chen, Karthik Narasimhan
In this paper, we explore the ability to model and infer personality types of opponents, predict their responses, and use this information to adapt a dialog agent's high-level strategy in negotiation tasks.