Search Results for author: Dongruo Zhou

Found 36 papers, 4 papers with code

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

no code implementations13 May 2022 Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Multi-Armed Bandits

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

no code implementations28 Feb 2022 Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

online learning

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations ICLR 2022 Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Multi-Armed Bandits Representation Learning

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

no code implementations NeurIPS 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.

Linear Contextual Bandits with Adversarial Corruptions

no code implementations NeurIPS 2021 Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

Multi-Armed Bandits

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Weitong Zhang, Dongruo Zhou, Quanquan Gu

By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.

Model-based Reinforcement Learning reinforcement-learning

Iterative Teacher-Aware Learning

no code implementations NeurIPS 2021 Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

no code implementations NeurIPS 2021 Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy.

reinforcement-learning

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

reinforcement-learning

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

no code implementations22 Jun 2021 Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task.

reinforcement-learning Representation Learning

Pure Exploration in Kernel and Neural Bandits

no code implementations NeurIPS 2021 Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification.

Batched Neural Bandits

no code implementations25 Feb 2021 Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.

Decision Making

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

no code implementations17 Feb 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

no code implementations15 Feb 2021 Yue Wu, Dongruo Zhou, Quanquan Gu

We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state.

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

no code implementations15 Feb 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

no code implementations NeurIPS 2021 Tianhao Wang, Dongruo Zhou, Quanquan Gu

In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches.

reinforcement-learning

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

no code implementations15 Dec 2020 Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting.

reinforcement-learning

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

no code implementations23 Nov 2020 Jiafan He, Dongruo Zhou, Quanquan Gu

Reinforcement learning (RL) with linear function approximation has received increasing attention recently.

reinforcement-learning

Provable Multi-Objective Reinforcement Learning with Generative Models

no code implementations19 Nov 2020 Dongruo Zhou, Jiahao Chen, Quanquan Gu

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs.

Q-Learning reinforcement-learning

Neural Thompson Sampling

3 code implementations ICLR 2021 Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

no code implementations NeurIPS 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting.

reinforcement-learning

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

no code implementations23 Jun 2020 Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

reinforcement-learning

Neural Contextual Bandits with UCB-based Exploration

2 code implementations ICML 2020 Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Training Deep Neural Networks with Partially Adaptive Momentum

no code implementations25 Sep 2019 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

NeuralUCB: Contextual Bandits with Neural Network-Based Exploration

no code implementations25 Sep 2019 Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

no code implementations31 Jan 2019 Dongruo Zhou, Quanquan Gu

Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dn\epsilon^{-2} \land d\epsilon^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $\epsilon$ is the optimization precision.

Lower Bounds for Smooth Nonconvex Finite-Sum Optimization

no code implementations31 Jan 2019 Dongruo Zhou, Quanquan Gu

We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function).

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method

no code implementations29 Nov 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods.

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

2 code implementations ICLR 2019 Jinghui Chen, Dongruo Zhou, Jin-Feng Yi, Quanquan Gu

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack.

Adversarial Attack

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

no code implementations21 Nov 2018 Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

no code implementations16 Aug 2018 Dongruo Zhou, Jinghui Chen, Yuan Cao, Yiqi Tang, Ziyan Yang, Quanquan Gu

In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.

Finding Local Minima via Stochastic Nested Variance Reduction

no code implementations22 Jun 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes.

Stochastic Optimization

Stochastic Nested Variance Reduction for Nonconvex Optimization

no code implementations NeurIPS 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

5 code implementations18 Jun 2018 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Stochastic Variance-Reduced Cubic Regularized Newton Method

no code implementations ICML 2018 Dongruo Zhou, Pan Xu, Quanquan Gu

At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method.

Cannot find the paper you are looking for? You can Submit a new open access paper.