Search Results for author: Dongruo Zhou

Found 44 papers, 5 papers with code

Neural Contextual Bandits with UCB-based Exploration

4 code implementations • ICML 2020 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Paper
Code

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

2 code implementations • 18 Jun 2018 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Paper
Code

Neural Thompson Sampling

2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

Paper
Code

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

2 code implementations • ICLR 2019 • Jinghui Chen, Dongruo Zhou, Jin-Feng Yi, Quanquan Gu

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack.

Adversarial Attack

Paper
Code

Iterative Teacher-Aware Learning

1 code implementation • NeurIPS 2021 • Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.

Paper
Code

Stochastic Variance-Reduced Cubic Regularized Newton Method

no code implementations • ICML 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.

Paper
Add Code

Stochastic Nested Variance Reduction for Nonconvex Optimization

no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Paper
Add Code

Finding Local Minima via Stochastic Nested Variance Reduction

no code implementations • 22 Jun 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.

Stochastic Optimization

Paper
Add Code

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

no code implementations • 16 Aug 2018 • Dongruo Zhou, Jinghui Chen, Yuan Cao, Yiqi Tang, Ziyan Yang, Quanquan Gu

In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.

Paper
Add Code

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

no code implementations • 21 Nov 2018 • Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

Binary Classification

Paper
Add Code

Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method

no code implementations • 29 Nov 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.

Paper
Add Code

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

no code implementations • NeurIPS 2018 • Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Paper
Add Code

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu

Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dn\epsilon^{-2} \land d\epsilon^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $\epsilon$ is the optimization precision.

Paper
Add Code

Lower Bounds for Smooth Nonconvex Finite-Sum Optimization

no code implementations • 31 Jan 2019 • Dongruo Zhou, Quanquan Gu

We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function).

Paper
Add Code

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

no code implementations • 23 Jun 2020 • Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provable Multi-Objective Reinforcement Learning with Generative Models

no code implementations • 19 Nov 2020 • Dongruo Zhou, Jiahao Chen, Quanquan Gu

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs.

Multi-Objective Reinforcement Learning Q-Learning +1

Paper
Add Code

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

no code implementations • 23 Nov 2020 • Jiafan He, Dongruo Zhou, Quanquan Gu

Reinforcement learning (RL) with linear function approximation has received increasing attention recently.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

no code implementations • 15 Dec 2020 • Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

no code implementations • NeurIPS 2021 • Tianhao Wang, Dongruo Zhou, Quanquan Gu

In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

no code implementations • 15 Feb 2021 • Yue Wu, Dongruo Zhou, Quanquan Gu

We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state.

Paper
Add Code

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

no code implementations • 15 Feb 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu

To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.

Paper
Add Code

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

no code implementations • 17 Feb 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

Reinforcement Learning (RL)

Paper
Add Code

Batched Neural Bandits

no code implementations • 25 Feb 2021 • Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.

Decision Making

Paper
Add Code

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

no code implementations • NeurIPS 2021 • Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy.

Off-policy evaluation

Paper
Add Code

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

no code implementations • 22 Jun 2021 • Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.

Offline RL reinforcement-learning +2

Paper
Add Code

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Pure Exploration in Kernel and Neural Bandits

no code implementations • NeurIPS 2021 • Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification.

Paper
Add Code

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

no code implementations • NeurIPS 2021 • Weitong Zhang, Dongruo Zhou, Quanquan Gu

By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Linear Contextual Bandits with Adversarial Corruptions

no code implementations • NeurIPS 2021 • Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

Multi-Armed Bandits

Paper
Add Code

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

no code implementations • NeurIPS 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu

In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.

Paper
Add Code

Training Deep Neural Networks with Partially Adaptive Momentum

no code implementations • 25 Sep 2019 • Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Paper
Add Code

NeuralUCB: Contextual Bandits with Neural Network-Based Exploration

no code implementations • 25 Sep 2019 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Paper
Add Code

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations • ICLR 2022 • Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Computational Efficiency Multi-Armed Bandits +1

Paper
Add Code

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Paper
Add Code

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

no code implementations • 13 May 2022 • Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Multi-Armed Bandits

Paper
Add Code

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

no code implementations • 23 May 2022 • Dongruo Zhou, Quanquan Gu

When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

no code implementations • 10 Aug 2022 • Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan

We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS).

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study reinforcement learning (RL) with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

Paper
Add Code

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

no code implementations • 23 Nov 2023 • Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu

Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.

regression

Paper
Add Code

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

no code implementations • 14 Feb 2024 • Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.

Paper
Add Code

DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

no code implementations • 5 Mar 2024 • ZiHao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, XiaoFeng Wang

DPAdapter modifies and enhances the sharpness-aware minimization (SAM) technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent, thereby improving parameter robustness against noise.

Paper
Add Code

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

no code implementations • 15 Mar 2024 • Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.