no code implementations • 4 Apr 2024 • Jing Dong, Baoxiang Wang, YaoLiang Yu
Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps.
1 code implementation • 19 Aug 2023 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL).
no code implementations • 19 Jun 2023 • Jing Dong, Jingyu Wu, Siwei Wang, Baoxiang Wang, Wei Chen
The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation.
1 code implementation • 19 May 2023 • Fang Kong, Jize Xie, Baoxiang Wang, Tao Yao, Shuai Li
The effect is neglected by previous OIM works under IC and linear threshold models.
no code implementations • 18 May 2023 • Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha
The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.
1 code implementation • NeurIPS 2023 • Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang
To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 23 Feb 2023 • Wenhao Li, Baoxiang Wang, Shanchao Yang, Hongyuan Zha
We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework.
no code implementations • 14 Feb 2023 • Fang Kong, Xiangcheng Zhang, Baoxiang Wang, Shuai Li
Learning Markov decision processes (MDP) in an adversarial environment has been a challenging problem.
no code implementations • 28 Nov 2022 • Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang
e. g., an agent is a random policy while other agents are medium policies.
no code implementations • 28 Sep 2022 • Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang
Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.
no code implementations • 13 Jun 2022 • Jiawei Xu, Cheng Zhou, Yizheng Zhang, Baoxiang Wang, Lei Han
Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer.
no code implementations • 25 Apr 2022 • Jing Dong, Shiji Zhou, Baoxiang Wang, Han Zhao
We thus study the problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory, and we aim to learn a classifier on a target data distribution of interest.
no code implementations • 28 Feb 2022 • Jing Dong, Li Shen, Yinggan Xu, Baoxiang Wang
We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation.
no code implementations • 25 Jan 2022 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.
no code implementations • 20 Dec 2021 • Qi Tian, Kun Kuang, Baoxiang Wang, Furui Liu, Fei Wu
The node information compression aims to address the problem of what to communicate via learning compact node representations.
Multi-agent Reinforcement Learning reinforcement-learning +3
1 code implementation • 18 Oct 2021 • Shanchao Yang, Kaili Ma, Baoxiang Wang, Tianshu Yu, Hongyuan Zha
In this case, GNNs can barely learn useful information, resulting in prohibitive difficulty in making actions for successively rewiring edges under a reinforcement learning context.
no code implementations • 9 Sep 2021 • Jing Dong, Shuai Li, Baoxiang Wang
Motivated by the common strategic activities in crowdsourcing labeling, we study the problem of sequential eliciting information without verification (EIWV) for workers with a heterogeneous and unknown crowd.
no code implementations • 1 Jun 2021 • Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu, Jun Xiao
Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent.
no code implementations • 24 May 2021 • Kun Wang, Jing Dong, Baoxiang Wang, Shuai Li, Shuo Shao
This paper studies \emph{differential privacy (DP)} and \emph{local differential privacy (LDP)} in cascading bandits.
1 code implementation • 25 Feb 2021 • Jing Dong, Ke Li, Shuai Li, Baoxiang Wang
Strategic behavior against sequential learning methods, such as "click framing" in real recommendation systems, have been widely observed.
no code implementations • 29 Mar 2020 • Andrej Bogdanov, Baoxiang Wang
In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$.
no code implementations • ICLR 2020 • Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan
We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached.
1 code implementation • NeurIPS 2019 • Baoxiang Wang, Nidhi Hegde
Our aim is to protect the value function approximator, without regard to the number of states queried to the function.
no code implementations • 29 May 2019 • Baoxiang Wang
Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness.
1 code implementation • 30 Jan 2019 • Baoxiang Wang, Nidhi Hegde
Our aim is to protect the value function approximator, without regard to the number of states queried to the function.
no code implementations • 27 Sep 2018 • Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng
In recent years, reinforcement learning methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go and Poker.
no code implementations • 1 Jul 2018 • Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng
Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.
no code implementations • 10 May 2018 • Kenny Young, Baoxiang Wang, Matthew E. Taylor
Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice.
1 code implementation • 9 May 2018 • Jiajin Li, Baoxiang Wang
Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators.