Search Results for author: Baoxiang Wang

Found 29 papers, 9 papers with code

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

no code implementations • 4 Apr 2024 • Jing Dong, Baoxiang Wang, YaoLiang Yu

Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps.

Paper
Add Code

DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning

1 code implementation • 19 Aug 2023 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning Privacy Preserving +1

Paper
Code

Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

no code implementations • 19 Jun 2023 • Jing Dong, Jingyu Wu, Siwei Wang, Baoxiang Wang, Wei Chen

The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation.

Paper
Add Code

Online Influence Maximization under Decreasing Cascade Model

1 code implementation • 19 May 2023 • Fang Kong, Jize Xie, Baoxiang Wang, Tao Yao, Shuai Li

The effect is neglected by previous OIM works under IC and linear threshold models.

Paper
Code

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

no code implementations • 18 May 2023 • Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha

The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.

Decision Making Multi-agent Reinforcement Learning +2

Paper
Add Code

Information Design in Multi-Agent Reinforcement Learning

1 code implementation • NeurIPS 2023 • Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang

To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Diverse Policy Optimization for Structured Action Space

1 code implementation • 23 Feb 2023 • Wenhao Li, Baoxiang Wang, Shanchao Yang, Hongyuan Zha

We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework.

Reinforcement Learning (RL)

147

Paper
Code

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

no code implementations • 14 Feb 2023 • Fang Kong, Xiangcheng Zhang, Baoxiang Wang, Shuai Li

Learning Markov decision processes (MDP) in an adversarial environment has been a challenging problem.

Paper
Add Code

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

no code implementations • 28 Nov 2022 • Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

e. g., an agent is a random policy while other agents are medium policies.

Continuous Control Graph Attention +5

Paper
Add Code

Online Policy Optimization for Robust MDP

no code implementations • 28 Sep 2022 • Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.

Reinforcement Learning (RL)

Paper
Add Code

Relative Policy-Transition Optimization for Fast Policy Transfer

no code implementations • 13 Jun 2022 • Jiawei Xu, Cheng Zhou, Yizheng Zhang, Baoxiang Wang, Lei Han

Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer.

Continuous Control LEMMA +1

Paper
Add Code

Algorithms and Theory for Supervised Gradual Domain Adaptation

no code implementations • 25 Apr 2022 • Jing Dong, Shiji Zhou, Baoxiang Wang, Han Zhao

We thus study the problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory, and we aim to learn a classifier on a target data distribution of interest.

Domain Adaptation

Paper
Add Code

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

no code implementations • 28 Feb 2022 • Jing Dong, Li Shen, Yinggan Xu, Baoxiang Wang

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation.

Continuous Control OpenAI Gym +1

Paper
Add Code

Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization

no code implementations • 25 Jan 2022 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.

OpenAI Gym

Paper
Add Code

CGIBNet: Bandwidth-constrained Communication with Graph Information Bottleneck in Multi-Agent Reinforcement Learning

no code implementations • 20 Dec 2021 • Qi Tian, Kun Kuang, Baoxiang Wang, Furui Liu, Fei Wu

The node information compression aims to address the problem of what to communicate via learning compact node representations.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Add Code

Edge Rewiring Goes Neural: Boosting Network Resilience without Rich Features

1 code implementation • 18 Oct 2021 • Shanchao Yang, Kaili Ma, Baoxiang Wang, Tianshu Yu, Hongyuan Zha

In this case, GNNs can barely learn useful information, resulting in prohibitive difficulty in making actions for successively rewiring edges under a reinforcement learning context.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Incentivizing an Unknown Crowd

no code implementations • 9 Sep 2021 • Jing Dong, Shuai Li, Baoxiang Wang

Motivated by the common strategic activities in crowdsourcing labeling, we study the problem of sequential eliciting information without verification (EIWV) for workers with a heterogeneous and unknown crowd.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

no code implementations • 1 Jun 2021 • Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu, Jun Xiao

Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent.

counterfactual Multi-agent Reinforcement Learning +4

Paper
Add Code

Cascading Bandit under Differential Privacy

no code implementations • 24 May 2021 • Kun Wang, Jing Dong, Baoxiang Wang, Shuai Li, Shuo Shao

This paper studies \emph{differential privacy (DP)} and \emph{local differential privacy (LDP)} in cascading bandits.

Paper
Add Code

Combinatorial Bandits under Strategic Manipulations

1 code implementation • 25 Feb 2021 • Jing Dong, Ke Li, Shuai Li, Baoxiang Wang

Strategic behavior against sequential learning methods, such as "click framing" in real recommendation systems, have been widely observed.

Multi-Armed Bandits Recommendation Systems

Paper
Code

Learning and Testing Variable Partitions

no code implementations • 29 Mar 2020 • Andrej Bogdanov, Baoxiang Wang

In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$.

Paper
Add Code

The Gambler's Problem and Beyond

no code implementations • ICLR 2020 • Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached.

Q-Learning reinforcement-learning +1

Paper
Add Code

Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces

1 code implementation • NeurIPS 2019 • Baoxiang Wang, Nidhi Hegde

Our aim is to protect the value function approximator, without regard to the number of states queried to the function.

Privacy Preserving Q-Learning +2

Paper
Code

Recurrent Existence Determination Through Policy Optimization

no code implementations • 29 May 2019 • Baoxiang Wang

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness.

Paper
Add Code

Privacy-preserving Q-Learning with Functional Noise in Continuous State Spaces

1 code implementation • 30 Jan 2019 • Baoxiang Wang, Nidhi Hegde

Our aim is to protect the value function approximator, without regard to the number of states queried to the function.

Privacy Preserving Q-Learning +2

Paper
Code

Beyond Winning and Losing: Modeling Human Motivations and Behaviors with Vector-valued Inverse Reinforcement Learning

no code implementations • 27 Sep 2018 • Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng

In recent years, reinforcement learning methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go and Poker.

Paper
Add Code

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

no code implementations • 1 Jul 2018 • Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng

Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

no code implementations • 10 May 2018 • Kenny Young, Baoxiang Wang, Matthew E. Taylor

Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice.

Atari Games Meta-Learning +1

Paper
Add Code

Policy Optimization with Second-Order Advantage Information

1 code implementation • 9 May 2018 • Jiajin Li, Baoxiang Wang

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators.

Continuous Control

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.