no code implementations • ICML 2020 • Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet
We first present a policy evaluation procedure in the ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently.
no code implementations • ICML 2020 • Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou
Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback.
no code implementations • 17 Jun 2024 • Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$.
no code implementations • 7 Jun 2024 • Jingyuan Wang, Perry Dong, Ying Jin, Ruohan Zhan, Zhengyuan Zhou
We develop a user response model that considers diverse user preferences and the varying effects of item positions, aiming to optimize overall user satisfaction with the ranked list.
1 code implementation • 16 May 2024 • Yu Xia, Sriram Narayanamoorthy, Zhengyuan Zhou, Joshua Mabry
The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail.
no code implementations • 12 Mar 2024 • Zijian Liu, Zhengyuan Zhou
Shuffling gradient methods are widely used in modern machine learning tasks and include three popular implementations: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG).
no code implementations • 12 Feb 2024 • Yuxiao Wen, Yanjun Han, Zhengyuan Zhou
Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies.
no code implementations • 13 Dec 2023 • Zijian Liu, Zhengyuan Zhou
For Lipschitz convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability.
no code implementations • 15 Nov 2023 • Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs).
no code implementations • 21 Oct 2023 • Michael I. Jordan, Tianyi Lin, Zhengyuan Zhou
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$.
no code implementations • 28 May 2023 • Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment in which the model is deployed.
no code implementations • 22 Mar 2023 • Zijian Liu, Zhengyuan Zhou
Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i. e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1, 2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks.
no code implementations • 26 Feb 2023 • Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
We consider a reinforcement learning setting in which the deployment environment is different from the training environment.
no code implementations • 14 Feb 2023 • Zijian Liu, Jiawei Zhang, Zhengyuan Zhou
For this class of problems, we propose the first variance-reduced accelerated algorithm and establish that it guarantees a high-probability convergence rate of $O(\log(T/\delta)T^{\frac{1-p}{2p-1}})$ under a mild condition, which is faster than $\Omega(T^{\frac{1-p}{3p-2}})$.
no code implementations • 13 Feb 2023 • Zijian Liu, Srikanth Jagabathula, Zhengyuan Zhou
Two recent works established the $O(\epsilon^{-3})$ sample complexity to obtain an $O(\epsilon)$-stationary point.
no code implementations • 27 Jan 2023 • Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou
As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI).
no code implementations • 5 Nov 2022 • Wei zhang, Yanjun Han, Zhengyuan Zhou, Aaron Flores, Tsachy Weissman
In the past four years, a particularly important development in the digital advertising industry is the shift from second-price auctions to first-price auctions for online display ads.
no code implementations • 14 Sep 2022 • Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou
Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).
1 code implementation • 2 Sep 2022 • Zhaonan Qu, Wenzhi Gao, Oliver Hinder, Yinyu Ye, Zhengyuan Zhou
Moreover, our implementation of customized solvers, combined with a random row/column sampling step, can find near-optimal diagonal preconditioners for matrices up to size 200, 000 in reasonable time, demonstrating their practical appeal.
no code implementations • 10 Jul 2022 • Boxiao Chen, Jiashuo Jiang, Jiawei Zhang, Zhengyuan Zhou
We aim to minimize the $T$-period cost, a problem that is known to be computationally intractable even under known distributions of demand and supply.
1 code implementation • 19 Feb 2022 • Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou
Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE.
1 code implementation • 6 Dec 2021 • Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan Zhou
Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(n\sqrt{T})$ under smooth and strongly concave reward functions ($n \geq 1$ is the problem dimension).
1 code implementation • 8 Jul 2021 • Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma
In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state.
no code implementations • 6 Jul 2021 • Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Yinyu Ye
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously.
1 code implementation • 5 May 2021 • Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou
Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education.
no code implementations • 8 Mar 2021 • Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
Using these bounds, we show that FKM and EXP3 have no weighted-regret even for $d_{t}=O\left(t\log t\right)$.
no code implementations • NeurIPS 2021 • Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step.
no code implementations • 10 Feb 2021 • Shi Dong, Benjamin Van Roy, Zhengyuan Zhou
The time it takes to approach asymptotic performance is polynomial in the complexity of the agent's state representation and the time required to evaluate the best policy that the agent can represent.
no code implementations • NeurIPS 2020 • Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, Yi Ma
The optimization problems associated with training generative adversarial neural networks can be largely reduced to certain {\em non-monotone} variational inequality problems (VIPs), whereas existing convergence results are mostly based on monotone or strongly monotone assumptions.
no code implementations • 27 Aug 2020 • Zhimei Ren, Zhengyuan Zhou
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can dynamically decide how many individuals to include in the next batch (at the end of the current batch) and what personalized action-selection scheme to adopt within each batch.
no code implementations • 11 Jul 2020 • Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou
For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings.
no code implementations • 9 Jul 2020 • Yanjun Han, Zhengyuan Zhou, Aaron Flores, Erik Ordentlich, Tsachy Weissman
In this paper, we take an online learning angle and address the fundamental problem of learning to bid in repeated first-price auctions, where both the bidder's private valuations and other bidders' bids can be arbitrary.
no code implementations • 10 Jun 2020 • Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet
Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence.
1 code implementation • ICLR 2020 • Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, Yi Ma
Recently, the $\ell^4$-norm maximization has been proposed to solve the sparse dictionary learning (SDL) problem.
no code implementations • 30 Apr 2020 • Xiaoteng Ma, Li Xia, Zhengyuan Zhou, Jun Yang, Qianchuan Zhao
In this paper, we present a new reinforcement learning (RL) algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance.
no code implementations • 14 Apr 2020 • Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W. Glynn, Yinyu Ye
We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end.
no code implementations • 22 Mar 2020 • Yanjun Han, Zhengyuan Zhou, Tsachy Weissman
In this paper, we develop the first learning algorithm that achieves a near-optimal $\widetilde{O}(\sqrt{T})$ regret bound, by exploiting two structural properties of first-price auctions, i. e. the specific feedback structure and payoff function.
no code implementations • 17 Mar 2020 • Zhaonan Qu, Isabella Qian, Zhengyuan Zhou
Our findings suggest that our proposed policy learning framework using tools from causal inference and Bayesian optimization provides a promising practical approach to interpretable personalization across a wide range of applications.
no code implementations • 11 Mar 2020 • Jose Blanchet, Renyuan Xu, Zhengyuan Zhou
In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed.
no code implementations • ICML 2020 • Tianyi Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, Michael. I. Jordan
In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games.
no code implementations • 13 Dec 2019 • Shi Dong, Benjamin Van Roy, Zhengyuan Zhou
We establish that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret $\tilde{\mathcal{O}}(\sqrt{H^5 M K} + \epsilon HK)$, where $H$ is the horizon, $M$ is the number of aggregate states, $K$ is the number of episodes, and $\epsilon$ is the largest difference between any pair of optimal state-action values associated with a common aggregate state.
no code implementations • NeurIPS 2019 • Zhengyuan Zhou, Renyuan Xu, Jose Blanchet
In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed.
no code implementations • NeurIPS 2019 • Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays \left\{ d_{t}\right\} that are unknown to the player.
no code implementations • 15 Dec 2018 • Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning.
no code implementations • NeurIPS 2018 • Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W. Glynn, Yinyu Ye
We consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games.
1 code implementation • 10 Oct 2018 • Zhengyuan Zhou, Susan Athey, Stefan Wager
In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action.
no code implementations • ICML 2018 • Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter Glynn, Yinyu Ye, Li-Jia Li, Li Fei-Fei
One of the most widely used optimization methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD).
1 code implementation • ICML 2018 • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Recent deep networks are capable of memorizing the entire data even when the labels are completely random.
Ranked #16 on Image Classification on WebVision-1000
no code implementations • NeurIPS 2017 • Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen Boyd, Peter W. Glynn
In this paper, we examine a class of non-convex stochastic optimization problems which we call variationally coherent, and which properly includes pseudo-/quasiconvex and star-convex optimization problems.
no code implementations • NeurIPS 2017 • Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Claire Tomlin
We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information.
no code implementations • 19 Nov 2017 • Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias.
no code implementations • 18 Jun 2017 • Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen Boyd, Peter Glynn
In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent.
no code implementations • 25 Aug 2016 • Panayotis Mertikopoulos, Zhengyuan Zhou
This paper examines the convergence of no-regret learning in games with continuous action sets.
no code implementations • NeurIPS 2013 • Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma
In this context, the state-of-the-art algorithms RASL'' and "TILT'' can be viewed as two special cases of our work, and yet each only performs part of the function of our method."