Search Results for author: Pan Xu

Found 48 papers, 9 papers with code

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

no code implementations14 Mar 2024 Zhishuai Liu, Pan Xu

Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces.

Offline RL Reinforcement Learning (RL)

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

1 code implementation24 Dec 2023 Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu

Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards.

Computational Efficiency Thompson Sampling

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

no code implementations24 Oct 2023 Zhen Qin, Zhishuai Liu, Pan Xu

Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm.

Optimal Batched Best Arm Identification

no code implementations21 Oct 2023 Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu

Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i. e., $\delta>0$ is arbitrarily fixed), while enjoying the same batch and sample complexity as Tri-BBAI when $\delta$ tends to zero.

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

1 code implementation29 May 2023 Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli

One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings.

Efficient Exploration reinforcement-learning +2

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

no code implementations30 Nov 2022 Yizhou Zhang, Guannan Qu, Pan Xu, Yiheng Lin, Zaiwei Chen, Adam Wierman

In particular, we show that, despite restricting each agent's attention to only its $\kappa$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $\kappa$.

Multi-agent Reinforcement Learning reinforcement-learning +1

Equity Promotion in Public Transportation

no code implementations26 Nov 2022 Anik Pramanik, Pan Xu, Yifan Xu

Specifically, we aim to design a strategy of allocating a given limited budget to different candidate programs such that the overall social equity is maximized, which is defined as the minimum covering ratio among all pre-specified protected groups of households (based on race, income, etc.).

Langevin Monte Carlo for Contextual Bandits

1 code implementation22 Jun 2022 Pan Xu, Hongkai Zheng, Eric Mazumdar, Kamyar Azizzadenesheli, Anima Anandkumar

Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i. e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices.

Multi-Armed Bandits Thompson Sampling

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

no code implementations7 Jun 2022 Tianyuan Jin, Pan Xu, Xiaokui Xiao, Anima Anandkumar

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc.

Multi-Armed Bandits Thompson Sampling

Equity Promotion in Online Resource Allocation

no code implementations8 Dec 2021 Pan Xu, Yifan Xu

We consider online resource allocation under a typical non-profit setting, where limited or even scarce resources are administered by a not-for-profit organization like a government.

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

1 code implementation8 Oct 2021 Yue Wu, Tao Jin, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu

In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items.

Fairness Maximization among Offline Agents in Online-Matching Markets

no code implementations18 Sep 2021 Will Ma, Pan Xu, Yifan Xu

Examples of online and offline agents include keywords (online) and sponsors (offline) in Google Advertising; workers (online) and tasks (offline) in Amazon Mechanical Turk (AMT); riders (online) and drivers (offline when restricted to a short time window) in ridesharing.

Decision Making Fairness

Trading the System Efficiency for the Income Equality of Drivers in Rideshare

no code implementations12 Dec 2020 Yifan Xu, Pan Xu

Rigorous online competitive ratio analysis is offered to demonstrate the flexibility and efficiency of our online algorithms in balancing the two conflicting goals, promotions of fairness and profit.

Fairness valid

A Unified Model for the Two-stage Offline-then-Online Resource Allocation

no code implementations12 Dec 2020 Yifan Xu, Pan Xu, Jianping Pan, Jun Tao

In this paper, we propose a unified model which incorporates both offline and online resource allocation into a single framework.

Decision Making

Neural Contextual Bandits with Deep Representation and Shallow Exploration

no code implementations NeurIPS 2021 Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown.

Multi-Armed Bandits Representation Learning

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

no code implementations NeurIPS 2020 Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu

In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.

Vocal Bursts Valence Prediction

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

no code implementations19 Oct 2020 Difan Zou, Pan Xu, Quanquan Gu

We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

no code implementations4 May 2020 Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu

In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.

Vocal Bursts Valence Prediction

MOTS: Minimax Optimal Thompson Sampling

no code implementations3 Mar 2020 Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods.

Thompson Sampling

Double Explore-then-Commit: Asymptotic Optimality and Beyond

no code implementations21 Feb 2020 Tianyuan Jin, Pan Xu, Xiaokui Xiao, Quanquan Gu

In this paper, we show that a variant of ETC algorithm can actually achieve the asymptotic optimality for multi-armed bandit problems as UCB-type algorithms do and extend it to the batched bandit setting.

Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms During High-Demand Hours

1 code implementation18 Dec 2019 Vedant Nanda, Pan Xu, Karthik Abinav Sankararaman, John P. Dickerson, Aravind Srinivasan

Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e. g. from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver.

Fairness

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

no code implementations10 Dec 2019 Pan Xu, Quanquan Gu

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms.

Q-Learning Reinforcement Learning (RL)

Rank Aggregation via Heterogeneous Thurstone Preference Models

1 code implementation3 Dec 2019 Tao Jin, Pan Xu, Quanquan Gu, Farzad Farnoud

By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users.

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

1 code implementation NeurIPS 2019 Difan Zou, Pan Xu, Quanquan Gu

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice.

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

no code implementations29 May 2019 Pan Xu, Felicia Gao, Quanquan Gu

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations NeurIPS 2018 Yaodong Yu, Pan Xu, Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Stochastic Optimization

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method

no code implementations29 Nov 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.

Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions

no code implementations ICML 2018 Pan Xu, Tianhao Wang, Quanquan Gu

We provide a second-order stochastic differential equation (SDE), which characterizes the continuous-time dynamics of accelerated stochastic mirror descent (ASMD) for strongly convex functions.

Stochastic Optimization

Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization

no code implementations ICML 2018 Jinghui Chen, Pan Xu, Lingxiao Wang, Jian Ma, Quanquan Gu

We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints.

Finding Local Minima via Stochastic Nested Variance Reduction

no code implementations22 Jun 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.

Stochastic Optimization

Stochastic Nested Variance Reduction for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Attenuate Locally, Win Globally: An Attenuation-based Framework for Online Stochastic Matching with Timeouts

no code implementations22 Apr 2018 Brian Brubach, Karthik Abinav Sankararaman, Aravind Srinivasan, Pan Xu

On the upper bound side, we show that this framework, combined with a black-box adapted from Bansal et al., (Algorithmica, 2012), yields an online algorithm which nearly doubles the ratio to 0. 46.

Stochastic Variance-Reduced Hamilton Monte Carlo Methods

no code implementations ICML 2018 Difan Zou, Pan Xu, Quanquan Gu

We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution.

Stochastic Optimization

Stochastic Variance-Reduced Cubic Regularized Newton Method

no code implementations ICML 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations18 Dec 2017 Yaodong Yu, Pan Xu, Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Stochastic Optimization

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization

no code implementations NeurIPS 2017 Pan Xu, Jian Ma, Quanquan Gu

In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization and an efficient alternating gradient descent algorithm with hard thresholding to solve it.

Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources

no code implementations22 Nov 2017 John P. Dickerson, Karthik A. Sankararaman, Aravind Srinivasan, Pan Xu

Prior work addresses online bipartite matching markets, where agents arrive over time and are dynamically matched to a known set of disposable resources.

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

no code implementations NeurIPS 2018 Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu

Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (SVRG-LD) to the almost minimizer within $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime.

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

no code implementations NeurIPS 2017 Pan Xu, Jian Ma, Quanquan Gu

In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm with hard thresholding to solve it.

Communication-efficient Distributed Estimation and Inference for Transelliptical Graphical Models

no code implementations29 Dec 2016 Pan Xu, Lu Tian, Quanquan Gu

In detail, the proposed method distributes the $d$-dimensional data of size $N$ generated from a transelliptical graphical model into $m$ worker machines, and estimates the latent precision matrix on each worker machine based on the data of size $n=N/m$.

Semiparametric Differential Graph Models

no code implementations NeurIPS 2016 Pan Xu, Quanquan Gu

In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network.

Cannot find the paper you are looking for? You can Submit a new open access paper.