Search Results for author: Zhengyuan Zhou

Found 54 papers, 9 papers with code

Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits

no code implementations ICML 2020 Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

We first present a policy evaluation procedure in the ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently.

Multi-Armed Bandits

Gradient-free Online Learning in Continuous Games with Delayed Rewards

no code implementations ICML 2020 Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou

Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback.

Multi-Armed Bandits Recommendation Systems

Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces

no code implementations17 Jun 2024 Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$.

Adaptively Learning to Select-Rank in Online Platforms

no code implementations7 Jun 2024 Jingyuan Wang, Perry Dong, Ying Jin, Ruohan Zhan, Zhengyuan Zhou

We develop a user response model that considers diverse user preferences and the varying effects of item positions, aiming to optimize overall user satisfaction with the ranked list.

Multi-Armed Bandits Thompson Sampling

On the Last-Iterate Convergence of Shuffling Gradient Methods

no code implementations12 Mar 2024 Zijian Liu, Zhengyuan Zhou

Shuffling gradient methods are widely used in modern machine learning tasks and include three popular implementations: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG).

Stochastic contextual bandits with graph feedback: from independence number to MAS number

no code implementations12 Feb 2024 Yuxiao Wen, Yanjun Han, Zhengyuan Zhou

Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies.

Multi-Armed Bandits

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

no code implementations13 Dec 2023 Zijian Liu, Zhengyuan Zhou

For Lipschitz convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability.

On the Foundation of Distributionally Robust Reinforcement Learning

no code implementations15 Nov 2023 Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs).

reinforcement-learning

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

no code implementations21 Oct 2023 Michael I. Jordan, Tianyi Lin, Zhengyuan Zhou

Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$.

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

no code implementations28 May 2023 Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment in which the model is deployed.

Decision Making Q-Learning

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation

no code implementations22 Mar 2023 Zijian Liu, Zhengyuan Zhou

Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i. e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1, 2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks.

Stochastic Optimization

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

no code implementations26 Feb 2023 Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

We consider a reinforcement learning setting in which the deployment environment is different from the training environment.

Q-Learning

Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise

no code implementations14 Feb 2023 Zijian Liu, Jiawei Zhang, Zhengyuan Zhou

For this class of problems, we propose the first variance-reduced accelerated algorithm and establish that it guarantees a high-probability convergence rate of $O(\log(T/\delta)T^{\frac{1-p}{2p-1}})$ under a mild condition, which is faster than $\Omega(T^{\frac{1-p}{3p-2}})$.

Stochastic Optimization

Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness

no code implementations13 Feb 2023 Zijian Liu, Srikanth Jagabathula, Zhengyuan Zhou

Two recent works established the $O(\epsilon^{-3})$ sample complexity to obtain an $O(\epsilon)$-stationary point.

Stochastic Optimization

Single-Trajectory Distributionally Robust Reinforcement Learning

no code implementations27 Jan 2023 Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou

As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI).

Decision Making Q-Learning +2

Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions

no code implementations5 Nov 2022 Wei zhang, Yanjun Han, Zhengyuan Zhou, Aaron Flores, Tsachy Weissman

In the past four years, a particularly important development in the digital advertising industry is the shift from second-price auctions to first-price auctions for online display ads.

Marketing

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

no code implementations14 Sep 2022 Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).

Offline RL reinforcement-learning +1

Optimal Diagonal Preconditioning

1 code implementation2 Sep 2022 Zhaonan Qu, Wenzhi Gao, Oliver Hinder, Yinyu Ye, Zhengyuan Zhou

Moreover, our implementation of customized solvers, combined with a random row/column sampling step, can find near-optimal diagonal preconditioners for matrices up to size 200, 000 in reasonable time, demonstrating their practical appeal.

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies

no code implementations10 Jul 2022 Boxiao Chen, Jiashuo Jiang, Jiawei Zhang, Zhengyuan Zhou

We aim to minimize the $T$-period cost, a problem that is known to be computationally intractable even under known distributions of demand and supply.

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

1 code implementation19 Feb 2022 Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou

Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE.

Off-policy evaluation

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

1 code implementation6 Dec 2021 Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan Zhou

Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(n\sqrt{T})$ under smooth and strongly concave reward functions ($n \geq 1$ is the problem dimension).

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

1 code implementation8 Jul 2021 Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state.

Hierarchical Reinforcement Learning Q-Learning +1

Distributed stochastic optimization with large delays

no code implementations6 Jul 2021 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Yinyu Ye

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously.

Distributed Computing Stochastic Optimization

Policy Learning with Adaptively Collected Data

1 code implementation5 May 2021 Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou

Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education.

Multi-Armed Bandits

No Weighted-Regret Learning in Adversarial Bandits with Delays

no code implementations8 Mar 2021 Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Using these bounds, we show that FKM and EXP3 have no weighted-regret even for $d_{t}=O\left(t\log t\right)$.

Online Multi-Armed Bandits with Adaptive Inference

no code implementations NeurIPS 2021 Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step.

Causal Inference Decision Making +2

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States

no code implementations10 Feb 2021 Shi Dong, Benjamin Van Roy, Zhengyuan Zhou

The time it takes to approach asymptotic performance is polynomial in the complexity of the agent's state representation and the time required to evaluate the best policy that the agent can represent.

Q-Learning reinforcement-learning +2

Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities

no code implementations NeurIPS 2020 Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, Yi Ma

The optimization problems associated with training generative adversarial neural networks can be largely reduced to certain {\em non-monotone} variational inequality problems (VIPs), whereas existing convergence results are mostly based on monotone or strongly monotone assumptions.

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

no code implementations27 Aug 2020 Zhimei Ren, Zhengyuan Zhou

We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can dynamically decide how many individuals to include in the next batch (at the end of the current batch) and what personalized action-selection scheme to adopt within each batch.

Decision Making Marketing +2

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

no code implementations11 Jul 2020 Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou

For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings.

Distributed Optimization Federated Learning

Learning to Bid Optimally and Efficiently in Adversarial First-price Auctions

no code implementations9 Jul 2020 Yanjun Han, Zhengyuan Zhou, Aaron Flores, Erik Ordentlich, Tsachy Weissman

In this paper, we take an online learning angle and address the fundamental problem of learning to bid in repeated first-price auctions, where both the bidder's private valuations and other bidders' bids can be arbitrary.

Distributionally Robust Batch Contextual Bandits

no code implementations10 Jun 2020 Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence.

Multi-Armed Bandits

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

no code implementations30 Apr 2020 Xiaoteng Ma, Li Xia, Zhengyuan Zhou, Jun Yang, Qianchuan Zhao

In this paper, we present a new reinforcement learning (RL) algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance.

Continuous Control reinforcement-learning +1

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

no code implementations14 Apr 2020 Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W. Glynn, Yinyu Ye

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end.

Decision Making Multi-Armed Bandits +1

Optimal No-regret Learning in Repeated First-price Auctions

no code implementations22 Mar 2020 Yanjun Han, Zhengyuan Zhou, Tsachy Weissman

In this paper, we develop the first learning algorithm that achieves a near-optimal $\widetilde{O}(\sqrt{T})$ regret bound, by exploiting two structural properties of first-price auctions, i. e. the specific feedback structure and payoff function.

Multi-Armed Bandits Thompson Sampling

Interpretable Personalization via Policy Learning with Linear Decision Boundaries

no code implementations17 Mar 2020 Zhaonan Qu, Isabella Qian, Zhengyuan Zhou

Our findings suggest that our proposed policy learning framework using tools from causal inference and Bayesian optimization provides a promising practical approach to interpretable personalization across a wide range of applications.

Bayesian Optimization BIG-bench Machine Learning +2

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

no code implementations11 Mar 2020 Jose Blanchet, Renyuan Xu, Zhengyuan Zhou

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed.

Multi-Armed Bandits Thompson Sampling

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

no code implementations ICML 2020 Tianyi Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, Michael. I. Jordan

In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games.

Provably Efficient Reinforcement Learning with Aggregated States

no code implementations13 Dec 2019 Shi Dong, Benjamin Van Roy, Zhengyuan Zhou

We establish that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret $\tilde{\mathcal{O}}(\sqrt{H^5 M K} + \epsilon HK)$, where $H$ is the horizon, $M$ is the number of aggregate states, $K$ is the number of episodes, and $\epsilon$ is the largest difference between any pair of optimal state-action values associated with a common aggregate state.

Q-Learning reinforcement-learning +1

Learning in Generalized Linear Contextual Bandits with Stochastic Delays

no code implementations NeurIPS 2019 Zhengyuan Zhou, Renyuan Xu, Jose Blanchet

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed.

Multi-Armed Bandits

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

no code implementations NeurIPS 2019 Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays \left\{ d_{t}\right\} that are unknown to the player.

Balanced Linear Contextual Bandits

no code implementations15 Dec 2018 Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning.

Causal Inference Multi-Armed Bandits

Learning in Games with Lossy Feedback

no code implementations NeurIPS 2018 Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W. Glynn, Yinyu Ye

We consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games.

Offline Multi-Action Policy Learning: Generalization and Optimization

1 code implementation10 Oct 2018 Zhengyuan Zhou, Susan Athey, Stefan Wager

In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action.

Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?

no code implementations ICML 2018 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter Glynn, Yinyu Ye, Li-Jia Li, Li Fei-Fei

One of the most widely used optimization methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD).

Stochastic Mirror Descent in Variationally Coherent Optimization Problems

no code implementations NeurIPS 2017 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen Boyd, Peter W. Glynn

In this paper, we examine a class of non-convex stochastic optimization problems which we call variationally coherent, and which properly includes pseudo-/quasiconvex and star-convex optimization problems.

Stochastic Optimization

Countering Feedback Delays in Multi-Agent Learning

no code implementations NeurIPS 2017 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Claire Tomlin

We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information.

Estimation Considerations in Contextual Bandits

no code implementations19 Nov 2017 Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias.

Causal Inference Econometrics +1

On the convergence of mirror descent beyond stochastic convex programming

no code implementations18 Jun 2017 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen Boyd, Peter Glynn

In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent.

Stochastic Optimization

Learning in games with continuous action sets and unknown payoff functions

no code implementations25 Aug 2016 Panayotis Mertikopoulos, Zhengyuan Zhou

This paper examines the convergence of no-regret learning in games with continuous action sets.

Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

no code implementations NeurIPS 2013 Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

In this context, the state-of-the-art algorithms RASL'' and "TILT'' can be viewed as two special cases of our work, and yet each only performs part of the function of our method."

Computational Efficiency

Cannot find the paper you are looking for? You can Submit a new open access paper.