1 code implementation • 24 Jun 2024 • Junkai Zhang, Weitong Zhang, Dongruo Zhou, Quanquan Gu

The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment and an uncertainty-weighted learning process to handle heterogeneous uncertainty in different samples.

no code implementations • 15 Mar 2024 • Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round.

no code implementations • 5 Mar 2024 • ZiHao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, XiaoFeng Wang

DPAdapter modifies and enhances the sharpness-aware minimization (SAM) technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent, thereby improving parameter robustness against noise.

no code implementations • 14 Feb 2024 • Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.

no code implementations • 23 Nov 2023 • Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu

Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.

no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study reinforcement learning (RL) with linear function approximation.

no code implementations • 10 Aug 2022 • Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan

We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS).

no code implementations • 23 May 2022 • Dongruo Zhou, Quanquan Gu

When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round.

no code implementations • 13 May 2022 • Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

no code implementations • ICLR 2022 • Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

no code implementations • NeurIPS 2021 • Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

no code implementations • NeurIPS 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu

In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.

no code implementations • NeurIPS 2021 • Weitong Zhang, Dongruo Zhou, Quanquan Gu

By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

1 code implementation • NeurIPS 2021 • Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.

no code implementations • NeurIPS 2021 • Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification.

no code implementations • 22 Jun 2021 • Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.

no code implementations • NeurIPS 2021 • Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy.

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

no code implementations • 25 Feb 2021 • Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.

no code implementations • 17 Feb 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

no code implementations • 15 Feb 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu

To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.

no code implementations • 15 Feb 2021 • Yue Wu, Dongruo Zhou, Quanquan Gu

We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state.

no code implementations • NeurIPS 2021 • Tianhao Wang, Dongruo Zhou, Quanquan Gu

In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches.

no code implementations • 15 Dec 2020 • Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting.

no code implementations • 23 Nov 2020 • Jiafan He, Dongruo Zhou, Quanquan Gu

Reinforcement learning (RL) with linear function approximation has received increasing attention recently.

no code implementations • 19 Nov 2020 • Dongruo Zhou, Jiahao Chen, Quanquan Gu

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs.

2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting.

no code implementations • 23 Jun 2020 • Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

4 code implementations • ICML 2020 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

no code implementations • 25 Sep 2019 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

no code implementations • 25 Sep 2019 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu

We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function).

no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu

Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dn\epsilon^{-2} \land d\epsilon^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $\epsilon$ is the optimization precision.

no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

no code implementations • 29 Nov 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.

2 code implementations • ICLR 2019 • Jinghui Chen, Dongruo Zhou, Jin-Feng Yi, Quanquan Gu

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack.

no code implementations • 21 Nov 2018 • Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

no code implementations • 16 Aug 2018 • Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu

In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.

no code implementations • 22 Jun 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.

no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

2 code implementations • 18 Jun 2018 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

no code implementations • ICML 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.