Search Results for author: Baoxiang Wang

Found 29 papers, 9 papers with code

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

no code implementations4 Apr 2024 Jing Dong, Baoxiang Wang, YaoLiang Yu

Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps.

Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

no code implementations19 Jun 2023 Jing Dong, Jingyu Wu, Siwei Wang, Baoxiang Wang, Wei Chen

The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation.

Online Influence Maximization under Decreasing Cascade Model

1 code implementation19 May 2023 Fang Kong, Jize Xie, Baoxiang Wang, Tao Yao, Shuai Li

The effect is neglected by previous OIM works under IC and linear threshold models.

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

no code implementations18 May 2023 Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha

The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.

Decision Making Multi-agent Reinforcement Learning +2

Information Design in Multi-Agent Reinforcement Learning

1 code implementation NeurIPS 2023 Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang

To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful.

Multi-agent Reinforcement Learning reinforcement-learning +1

Diverse Policy Optimization for Structured Action Space

1 code implementation23 Feb 2023 Wenhao Li, Baoxiang Wang, Shanchao Yang, Hongyuan Zha

We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework.

Reinforcement Learning (RL)

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

no code implementations14 Feb 2023 Fang Kong, Xiangcheng Zhang, Baoxiang Wang, Shuai Li

Learning Markov decision processes (MDP) in an adversarial environment has been a challenging problem.

Online Policy Optimization for Robust MDP

no code implementations28 Sep 2022 Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.

Reinforcement Learning (RL)

Relative Policy-Transition Optimization for Fast Policy Transfer

no code implementations13 Jun 2022 Jiawei Xu, Cheng Zhou, Yizheng Zhang, Baoxiang Wang, Lei Han

Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer.

Continuous Control LEMMA +1

Algorithms and Theory for Supervised Gradual Domain Adaptation

no code implementations25 Apr 2022 Jing Dong, Shiji Zhou, Baoxiang Wang, Han Zhao

We thus study the problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory, and we aim to learn a classifier on a target data distribution of interest.

Domain Adaptation

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

no code implementations28 Feb 2022 Jing Dong, Li Shen, Yinggan Xu, Baoxiang Wang

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation.

Continuous Control OpenAI Gym +1

Edge Rewiring Goes Neural: Boosting Network Resilience without Rich Features

1 code implementation18 Oct 2021 Shanchao Yang, Kaili Ma, Baoxiang Wang, Tianshu Yu, Hongyuan Zha

In this case, GNNs can barely learn useful information, resulting in prohibitive difficulty in making actions for successively rewiring edges under a reinforcement learning context.

reinforcement-learning Reinforcement Learning (RL)

Incentivizing an Unknown Crowd

no code implementations9 Sep 2021 Jing Dong, Shuai Li, Baoxiang Wang

Motivated by the common strategic activities in crowdsourcing labeling, we study the problem of sequential eliciting information without verification (EIWV) for workers with a heterogeneous and unknown crowd.

reinforcement-learning Reinforcement Learning (RL)

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

no code implementations1 Jun 2021 Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu, Jun Xiao

Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent.

counterfactual Multi-agent Reinforcement Learning +4

Cascading Bandit under Differential Privacy

no code implementations24 May 2021 Kun Wang, Jing Dong, Baoxiang Wang, Shuai Li, Shuo Shao

This paper studies \emph{differential privacy (DP)} and \emph{local differential privacy (LDP)} in cascading bandits.

Combinatorial Bandits under Strategic Manipulations

1 code implementation25 Feb 2021 Jing Dong, Ke Li, Shuai Li, Baoxiang Wang

Strategic behavior against sequential learning methods, such as "click framing" in real recommendation systems, have been widely observed.

Multi-Armed Bandits Recommendation Systems

Learning and Testing Variable Partitions

no code implementations29 Mar 2020 Andrej Bogdanov, Baoxiang Wang

In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$.

The Gambler's Problem and Beyond

no code implementations ICLR 2020 Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached.

Q-Learning reinforcement-learning +1

Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces

1 code implementation NeurIPS 2019 Baoxiang Wang, Nidhi Hegde

Our aim is to protect the value function approximator, without regard to the number of states queried to the function.

Privacy Preserving Q-Learning +2

Recurrent Existence Determination Through Policy Optimization

no code implementations29 May 2019 Baoxiang Wang

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness.

Privacy-preserving Q-Learning with Functional Noise in Continuous State Spaces

1 code implementation30 Jan 2019 Baoxiang Wang, Nidhi Hegde

Our aim is to protect the value function approximator, without regard to the number of states queried to the function.

Privacy Preserving Q-Learning +2

Beyond Winning and Losing: Modeling Human Motivations and Behaviors with Vector-valued Inverse Reinforcement Learning

no code implementations27 Sep 2018 Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng

In recent years, reinforcement learning methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go and Poker.

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

no code implementations1 Jul 2018 Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng

Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.

reinforcement-learning Reinforcement Learning (RL)

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

no code implementations10 May 2018 Kenny Young, Baoxiang Wang, Matthew E. Taylor

Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice.

Atari Games Meta-Learning +1

Policy Optimization with Second-Order Advantage Information

1 code implementation9 May 2018 Jiajin Li, Baoxiang Wang

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators.

Continuous Control

Cannot find the paper you are looking for? You can Submit a new open access paper.