no code implementations • 9 Mar 2024 • Min Cheng, Ruida Zhou, P. R. Kumar, Chao Tian
We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.
1 code implementation • NeurIPS 2023 • Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.
1 code implementation • 4 Apr 2019 • Zhenguo Yang, Zehang Lin, Min Cheng, Qing Li, Wenyin Liu
In this work, we construct and release a multi-domain and multi-modality event dataset (MMED), containing 25, 165 textual news articles collected from hundreds of news media sites (e. g., Yahoo News, Google News, CNN News.)