no code implementations • 4 Feb 2024 • Yifeng He, Jiabo Huang, Yuyang Rong, Yiwen Guo, Ethan Wang, Hao Chen
The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community.
1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao
Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.
no code implementations • 29 Sep 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation- invariant actor-critic neural architecture.
no code implementations • 18 May 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.