no code implementations • 13 Jun 2025 • Qirui Mi, Qipeng Yang, Zijun Fan, Wentian Fan, Heyang Ma, Chengdong Ma, Siyu Xia, Bo An, Jun Wang, Haifeng Zhang
Artificial intelligence (AI) has become a powerful tool for economic research, enabling large-scale simulation and policy optimization.
no code implementations • 26 Feb 2025 • Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang
How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied.
no code implementations • 22 Oct 2024 • Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang
Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains.
no code implementations • 2 Aug 2024 • Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang
Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL).
Multi-agent Reinforcement Learning
reinforcement-learning
+3
no code implementations • 31 May 2024 • Jiesong Lian, Yucong Huang, Chengdong Ma, Mingzhi Wang, Ying Wen, Long Hu, Yixue Hao
For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE).
no code implementations • 14 Mar 2024 • Qirui Mi, Zhiyu Zhao, Chengdong Ma, Siyu Xia, Yan Song, Mengyue Yang, Jun Wang, Haifeng Zhang
Macroeconomic outcomes emerge from individuals' decisions, making it essential to model how agents interact with macro policy via consumption, investment, and labor choices.
no code implementations • 20 Feb 2024 • Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang
The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety.
no code implementations • 3 Feb 2024 • Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang
Panacea trains a single model capable of adapting online and Pareto-optimally to diverse sets of preferences without the need for further tuning.
no code implementations • 30 Sep 2023 • Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang
Furthermore, we develop a Gamified Red Team Solver (GRTS) with diversity measures to mitigate mode collapse and theoretically guarantee the convergence of approximate Nash equilibrium which results in better strategies for both teams.
no code implementations • 13 Jul 2022 • Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.
1 code implementation • 1 Nov 2020 • Yaodong Yang, Chengdong Ma, Zihan Ding, Stephen Mcaleer, Chi Jin, Jun Wang
In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.
Multi-agent Reinforcement Learning
reinforcement-learning
+2