4 code implementations • ICML 2020 • Dongruo Zhou, Lihong Li, Quanquan Gu
To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.
2 code implementations • 18 Jun 2018 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu
Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.
2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.
2 code implementations • ICLR 2019 • Jinghui Chen, Dongruo Zhou, Jin-Feng Yi, Quanquan Gu
Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack.
1 code implementation • NeurIPS 2021 • Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu
Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.
no code implementations • ICML 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu
At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.
no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.
no code implementations • 22 Jun 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu
For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.
no code implementations • 16 Aug 2018 • Dongruo Zhou, Jinghui Chen, Yuan Cao, Yiqi Tang, Ziyan Yang, Quanquan Gu
In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.
no code implementations • 21 Nov 2018 • Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu
In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.
no code implementations • 29 Nov 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu
The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.
no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.
no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu
Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dn\epsilon^{-2} \land d\epsilon^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $\epsilon$ is the optimization precision.
no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu
We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function).
no code implementations • 23 Jun 2020 • Dongruo Zhou, Jiafan He, Quanquan Gu
We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.
no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu
We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting.
no code implementations • 19 Nov 2020 • Dongruo Zhou, Jiahao Chen, Quanquan Gu
Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs.
no code implementations • 23 Nov 2020 • Jiafan He, Dongruo Zhou, Quanquan Gu
Reinforcement learning (RL) with linear function approximation has received increasing attention recently.
no code implementations • 15 Dec 2020 • Dongruo Zhou, Quanquan Gu, Csaba Szepesvari
Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting.
no code implementations • NeurIPS 2021 • Tianhao Wang, Dongruo Zhou, Quanquan Gu
In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches.
no code implementations • 15 Feb 2021 • Yue Wu, Dongruo Zhou, Quanquan Gu
We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state.
no code implementations • 15 Feb 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu
To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.
no code implementations • 17 Feb 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu
In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.
no code implementations • 25 Feb 2021 • Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou
In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.
no code implementations • NeurIPS 2021 • Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy.
no code implementations • 22 Jun 2021 • Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu
For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.
no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu
The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.
no code implementations • NeurIPS 2021 • Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak
To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification.
no code implementations • NeurIPS 2021 • Weitong Zhang, Dongruo Zhou, Quanquan Gu
By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • NeurIPS 2021 • Heyang Zhao, Dongruo Zhou, Quanquan Gu
We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.
no code implementations • NeurIPS 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu
In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.
no code implementations • 25 Sep 2019 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu
Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.
no code implementations • 25 Sep 2019 • Dongruo Zhou, Lihong Li, Quanquan Gu
To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.
no code implementations • ICLR 2022 • Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang
Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.
no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu
We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.
no code implementations • 13 May 2022 • Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu
We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.
no code implementations • 23 May 2022 • Dongruo Zhou, Quanquan Gu
When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round.
no code implementations • 10 Aug 2022 • Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS).
no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu
We study reinforcement learning (RL) with linear function approximation.
no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu
We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.
no code implementations • 23 Nov 2023 • Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu
Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.
no code implementations • 14 Feb 2024 • Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu
Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.
no code implementations • 5 Mar 2024 • ZiHao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, XiaoFeng Wang
DPAdapter modifies and enhances the sharpness-aware minimization (SAM) technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent, thereby improving parameter robustness against noise.
no code implementations • 15 Mar 2024 • Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou
We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round.