no code implementations • ICLR 2019 • Digvijay Boob, Santanu S. Dey, Guanghui Lan
In this paper, we explore some basic questions on complexity of training Neural networks with ReLU activation function.
no code implementations • 16 Oct 2023 • Tianjiao Li, Guanghui Lan
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e. g., Lipschitz constant).
no code implementations • 29 Jul 2023 • Yan Li, Guanghui Lan
We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets.
no code implementations • 4 Jul 2023 • Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li
However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size.
no code implementations • 28 Mar 2023 • Guanghui Lan, Alexander Shapiro
Optimization problems involving sequential decisions in a stochastic environment were studied in Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP).
no code implementations • 8 Mar 2023 • Yan Li, Guanghui Lan
SPMD with the second evaluation operator, namely truncated on-policy Monte Carlo (TOMC), attains an $\tilde{\mathcal{O}}(\mathcal{H}_{\mathcal{D}}/\epsilon^2)$ sample complexity, where $\mathcal{H}_{\mathcal{D}}$ mildly depends on the effective horizon and the size of the action space with properly chosen Bregman divergence (e. g., Tsallis divergence).
no code implementations • 30 Nov 2022 • Guanghui Lan
We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces.
no code implementations • 11 Oct 2022 • Yi Cheng, Guanghui Lan, H. Edwin Romeijn
The DNCG is the first single-loop projection-free method, with iteration complexity bounded by $\mathcal{O}\big(1/\epsilon^4\big)$ for computing a so-called $\epsilon$-Wolfe point.
no code implementations • 21 Sep 2022 • Yan Li, Guanghui Lan, Tuo Zhao
We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.
no code implementations • 11 May 2022 • Tianjiao Li, Feiyang Wu, Guanghui Lan
We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization.
no code implementations • 10 Mar 2022 • Guanghui Lan, Zhe Zhang
Specifically, the DRAO method achieves the optimal communication complexity by assuming a certain saddle point subproblem can be easily solved in the server node.
no code implementations • 16 Feb 2022 • Shuoguang Yang, Xudong Li, Guanghui Lan
We propose a class of efficient primal-dual algorithms to tackle the minimax expectation-constrained problem, and show that our algorithms converge at the optimal rate of $\mathcal{O}(\frac{1}{\sqrt{N}})$.
no code implementations • 24 Jan 2022 • Yan Li, Guanghui Lan, Tuo Zhao
We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies.
no code implementations • 15 Jan 2022 • Guanghui Lan, Yan Li, Tuo Zhao
Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality.
no code implementations • 24 Dec 2021 • Tianjiao Li, Guanghui Lan, Ashwin Pananjady
To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
no code implementations • 20 Oct 2021 • Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan
Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural, paternain2019constrained}.
no code implementations • ICLR 2022 • Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan
We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.
no code implementations • 30 Jan 2021 • Guanghui Lan
We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log (1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon ) (L/\epsilon)^{1/2}\}$)for problems with strongly (resp., general) convex regularizers.
no code implementations • 19 Nov 2020 • Zhe Zhang, Guanghui Lan
All these complexity results seem to be new in the literature and they indicate that the convex NSCO problem has the same order of oracle complexity as those without the nested composition in all but the strongly convex and outer-non-smooth problem.
no code implementations • 15 Nov 2020 • Georgios Kotsalis, Guanghui Lan, Tianjiao Li
This brings us to the fast TD (FTD) algorithm which combines elements of CTD and the stochastic operator extrapolation method of the companion paper.
no code implementations • 11 Nov 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan
To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.
no code implementations • 5 Nov 2020 • Georgios Kotsalis, Guanghui Lan, Tianjiao Li
In this paper we first present a novel operator extrapolation (OE) method for solving deterministic variational inequality (VI) problems.
no code implementations • NeurIPS 2020 • Digvijay Boob, Qi Deng, Guanghui Lan, Yilin Wang
We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases.
no code implementations • 28 Sep 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan
To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.
no code implementations • 30 Jun 2020 • Guanghui Lan, Edwin Romeijn, Zhiqiang Zhou
Conditional gradient methods have attracted much attention in both machine learning and optimization communities recently.
no code implementations • 3 Jun 2020 • Zi Xu, Huiling Zhang, Yang Xu, Guanghui Lan
Moreover, its gradient complexity to obtain an $\varepsilon$-stationary point of the objective function is bounded by $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp., $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under the strongly convex-nonconcave (resp., convex-nonconcave) setting.
no code implementations • 16 Dec 2019 • Guanghui Lan
We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption.
no code implementations • 7 Aug 2019 • Digvijay Boob, Qi Deng, Guanghui Lan
For large-scale and stochastic problems, we present a more practical proximal point method in which the approximate solutions of the subproblems are computed by the aforementioned ConEx method.
1 code implementation • ICLR 2020 • Harsh Shrivastava, Xinshi Chen, Binghong Chen, Guanghui Lan, Srinvas Aluru, Han Liu, Le Song
Recently, there is a surge of interest to learn algorithms directly based on data, and in this case, learn to map empirical covariance to the sparse precision matrix.
no code implementations • NeurIPS 2019 • Guanghui Lan, Zhize Li, Yi Zhou
Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.
no code implementations • 9 Oct 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan
However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization. In this paper, we apply the momentum scheme to cubic regularized (CR) Newton's method and explore the potential for acceleration.
1 code implementation • 1 Oct 2018 • Qi Deng, Yi Cheng, Guanghui Lan
More specifically, we show that diagonal scaling, initially designed to improve vanilla stochastic gradient, can be incorporated into accelerated stochastic gradient descent to achieve the optimal rate of convergence for smooth stochastic optimization.
no code implementations • 27 Sep 2018 • Digvijay Boob, Santanu S. Dey, Guanghui Lan
In this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function.
no code implementations • 24 Sep 2018 • Guanghui Lan, Yi Zhou
In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks.
no code implementations • 22 Aug 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan
This note considers the inexact cubic-regularized Newton's method (CR), which has been shown in \cite{Cartis2011a} to achieve the same order-level convergence rate to a secondary stationary point as the exact CR \citep{Nesterov2006}.
no code implementations • 20 Feb 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan
Cubic regularization (CR) is an optimization method with emerging popularity due to its capability to escape saddle points and converge to second-order stationary solutions for nonconvex optimization.
no code implementations • ICLR 2018 • Digvijay Boob, Guanghui Lan
We essentially show that these non-singular hidden layer matrix satisfy a ``"good" property for these big class of activation functions.
no code implementations • 15 Nov 2017 • Guanghui Lan, Yi Zhou
Furthermore, we demonstrate that for stochastic finite-sum optimization problems, RGEM maintains the optimal ${\cal O}(1/\epsilon)$ complexity (up to a certain logarithmic factor) in terms of the number of stochastic gradient computations, but attains an ${\cal O}(\log(1/\epsilon))$ complexity in terms of communication rounds (each round involves only one agent).
no code implementations • 30 Oct 2017 • Digvijay Boob, Guanghui Lan
We look at this problem in the setting where the number of parameters is greater than the number of sampled points.
no code implementations • 11 Jul 2017 • Guanghui Lan, Zhiqiang Zhou
We show that DSA can achieve an optimal ${\cal O}(1/\epsilon^4)$ rate of convergence in terms of the total number of required scenarios when applied to a three-stage stochastic optimization problem.
no code implementations • ICML 2017 • Guanghui Lan, Sebastian Pokutta, Yi Zhou, Daniel Zink
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.
no code implementations • 14 Jan 2017 • Guanghui Lan, Soomin Lee, Yi Zhou
Our major contribution is to present a new class of decentralized primal-dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions.
no code implementations • 13 Apr 2016 • Guanghui Lan, Zhiqiang Zhou
We then present a variant of CSA, namely the cooperative stochastic parameter approximation (CSPA) algorithm, to deal with the situation when the constraint is defined over problem parameters and show that it exhibits similar optimal rate of convergence to CSA.
no code implementations • 29 Aug 2015 • Saeed Ghadimi, Guanghui Lan, Hongchao Zhang
In a similar vein, we show that some well-studied techniques for nonlinear programming, e. g., Quasi-Newton iteration, can be embedded into optimal convex optimization algorithms to possibly further enhance their numerical performance.
no code implementations • 8 Jul 2015 • Guanghui Lan, Yi Zhou
We first introduce a deterministic primal-dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal-dual termination criterion.
1 code implementation • 4 Jun 2014 • Guanghui Lan
We consider in this paper a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term.
1 code implementation • 14 Oct 2013 • Saeed Ghadimi, Guanghui Lan
We demonstrate that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general nonconvex smooth optimization problems by using first-order information, similarly to the gradient descent method.
Optimization and Control
no code implementations • 22 Sep 2013 • Saeed Ghadimi, Guanghui Lan
In this paper, we introduce a new stochastic approximation (SA) type algorithm, namely the randomized stochastic gradient (RSG) method, for solving an important class of nonlinear (possibly nonconvex) stochastic programming (SP) problems.