Search Results for author: Dongruo Zhou

Found 40 papers, 5 papers with code

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations21 Feb 2023 Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Decision Making Multi-Armed Bandits

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

no code implementations10 Aug 2022 Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan

We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS).

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

no code implementations23 May 2022 Dongruo Zhou, Quanquan Gu

When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round.

Multi-Armed Bandits reinforcement-learning +1

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

no code implementations13 May 2022 Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Multi-Armed Bandits

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations28 Feb 2022 Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations ICLR 2022 Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Multi-Armed Bandits Representation Learning

Linear Contextual Bandits with Adversarial Corruptions

no code implementations NeurIPS 2021 Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

Multi-Armed Bandits

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

no code implementations NeurIPS 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Weitong Zhang, Dongruo Zhou, Quanquan Gu

By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.

Model-based Reinforcement Learning reinforcement-learning +1

Iterative Teacher-Aware Learning

1 code implementation NeurIPS 2021 Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

no code implementations NeurIPS 2021 Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy.

Off-policy evaluation

Pure Exploration in Kernel and Neural Bandits

no code implementations NeurIPS 2021 Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification.

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

no code implementations22 Jun 2021 Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task.

reinforcement-learning Reinforcement Learning (RL) +1

Batched Neural Bandits

no code implementations25 Feb 2021 Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.

Decision Making

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

no code implementations17 Feb 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

Reinforcement Learning (RL)

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

no code implementations15 Feb 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

no code implementations15 Feb 2021 Yue Wu, Dongruo Zhou, Quanquan Gu

We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state.

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

no code implementations NeurIPS 2021 Tianhao Wang, Dongruo Zhou, Quanquan Gu

In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches.

reinforcement-learning Reinforcement Learning (RL)

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

no code implementations15 Dec 2020 Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting.

reinforcement-learning Reinforcement Learning (RL)

Provable Multi-Objective Reinforcement Learning with Generative Models

no code implementations19 Nov 2020 Dongruo Zhou, Jiahao Chen, Quanquan Gu

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs.

Multi-Objective Reinforcement Learning Q-Learning +1

Neural Thompson Sampling

3 code implementations ICLR 2021 Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

no code implementations23 Jun 2020 Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

reinforcement-learning Reinforcement Learning (RL)

Neural Contextual Bandits with UCB-based Exploration

5 code implementations ICML 2020 Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

NeuralUCB: Contextual Bandits with Neural Network-Based Exploration

no code implementations25 Sep 2019 Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Training Deep Neural Networks with Partially Adaptive Momentum

no code implementations25 Sep 2019 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Lower Bounds for Smooth Nonconvex Finite-Sum Optimization

no code implementations31 Jan 2019 Dongruo Zhou, Quanquan Gu

We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function).

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

no code implementations31 Jan 2019 Dongruo Zhou, Quanquan Gu

Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dn\epsilon^{-2} \land d\epsilon^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $\epsilon$ is the optimization precision.

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method

no code implementations29 Nov 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

2 code implementations ICLR 2019 Jinghui Chen, Dongruo Zhou, Jin-Feng Yi, Quanquan Gu

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack.

Adversarial Attack

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

no code implementations21 Nov 2018 Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

Binary Classification

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

no code implementations16 Aug 2018 Dongruo Zhou, Jinghui Chen, Yuan Cao, Yiqi Tang, Ziyan Yang, Quanquan Gu

In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.

Finding Local Minima via Stochastic Nested Variance Reduction

no code implementations22 Jun 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.

Stochastic Optimization

Stochastic Nested Variance Reduction for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

2 code implementations18 Jun 2018 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Stochastic Variance-Reduced Cubic Regularized Newton Method

no code implementations ICML 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.

Cannot find the paper you are looking for? You can Submit a new open access paper.