no code implementations • ICLR 2019 • Digvijay Boob, Santanu S. Dey, Guanghui Lan

In this paper, we explore some basic questions on complexity of training Neural networks with ReLU activation function.

no code implementations • 16 Oct 2023 • Tianjiao Li, Guanghui Lan

Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e. g., Lipschitz constant).

no code implementations • 29 Jul 2023 • Yan Li, Guanghui Lan

We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets.

no code implementations • 4 Jul 2023 • Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li

However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size.

no code implementations • 28 Mar 2023 • Guanghui Lan, Alexander Shapiro

Optimization problems involving sequential decisions in a stochastic environment were studied in Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP).

no code implementations • 8 Mar 2023 • Yan Li, Guanghui Lan

SPMD with the second evaluation operator, namely truncated on-policy Monte Carlo (TOMC), attains an $\tilde{\mathcal{O}}(\mathcal{H}_{\mathcal{D}}/\epsilon^2)$ sample complexity, where $\mathcal{H}_{\mathcal{D}}$ mildly depends on the effective horizon and the size of the action space with properly chosen Bregman divergence (e. g., Tsallis divergence).

no code implementations • 30 Nov 2022 • Guanghui Lan

We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces.

no code implementations • 11 Oct 2022 • Yi Cheng, Guanghui Lan, H. Edwin Romeijn

The DNCG is the first single-loop projection-free method, with iteration complexity bounded by $\mathcal{O}\big(1/\epsilon^4\big)$ for computing a so-called $\epsilon$-Wolfe point.

no code implementations • 21 Sep 2022 • Yan Li, Guanghui Lan, Tuo Zhao

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.

no code implementations • 11 May 2022 • Tianjiao Li, Feiyang Wu, Guanghui Lan

We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization.

no code implementations • 10 Mar 2022 • Guanghui Lan, Zhe Zhang

Specifically, the DRAO method achieves the optimal communication complexity by assuming a certain saddle point subproblem can be easily solved in the server node.

no code implementations • 16 Feb 2022 • Shuoguang Yang, Xudong Li, Guanghui Lan

We propose a class of efficient primal-dual algorithms to tackle the minimax expectation-constrained problem, and show that our algorithms converge at the optimal rate of $\mathcal{O}(\frac{1}{\sqrt{N}})$.

no code implementations • 24 Jan 2022 • Yan Li, Guanghui Lan, Tuo Zhao

We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies.

no code implementations • 15 Jan 2022 • Guanghui Lan, Yan Li, Tuo Zhao

Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality.

no code implementations • 24 Dec 2021 • Tianjiao Li, Guanghui Lan, Ashwin Pananjady

To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.

no code implementations • 20 Oct 2021 • Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan

Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural, paternain2019constrained}.

no code implementations • ICLR 2022 • Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.

no code implementations • 30 Jan 2021 • Guanghui Lan

We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log (1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon ) (L/\epsilon)^{1/2}\}$)for problems with strongly (resp., general) convex regularizers.

no code implementations • 19 Nov 2020 • Zhe Zhang, Guanghui Lan

All these complexity results seem to be new in the literature and they indicate that the convex NSCO problem has the same order of oracle complexity as those without the nested composition in all but the strongly convex and outer-non-smooth problem.

no code implementations • 15 Nov 2020 • Georgios Kotsalis, Guanghui Lan, Tianjiao Li

This brings us to the fast TD (FTD) algorithm which combines elements of CTD and the stochastic operator extrapolation method of the companion paper.

no code implementations • 11 Nov 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan

To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.

no code implementations • 5 Nov 2020 • Georgios Kotsalis, Guanghui Lan, Tianjiao Li

In this paper we first present a novel operator extrapolation (OE) method for solving deterministic variational inequality (VI) problems.

no code implementations • NeurIPS 2020 • Digvijay Boob, Qi Deng, Guanghui Lan, Yilin Wang

We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases.

no code implementations • 28 Sep 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan

To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.

no code implementations • 30 Jun 2020 • Guanghui Lan, Edwin Romeijn, Zhiqiang Zhou

Conditional gradient methods have attracted much attention in both machine learning and optimization communities recently.

no code implementations • 3 Jun 2020 • Zi Xu, Huiling Zhang, Yang Xu, Guanghui Lan

Moreover, its gradient complexity to obtain an $\varepsilon$-stationary point of the objective function is bounded by $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp., $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under the strongly convex-nonconcave (resp., convex-nonconcave) setting.

no code implementations • 16 Dec 2019 • Guanghui Lan

We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption.

no code implementations • 7 Aug 2019 • Digvijay Boob, Qi Deng, Guanghui Lan

For large-scale and stochastic problems, we present a more practical proximal point method in which the approximate solutions of the subproblems are computed by the aforementioned ConEx method.

1 code implementation • ICLR 2020 • Harsh Shrivastava, Xinshi Chen, Binghong Chen, Guanghui Lan, Srinvas Aluru, Han Liu, Le Song

Recently, there is a surge of interest to learn algorithms directly based on data, and in this case, learn to map empirical covariance to the sparse precision matrix.

no code implementations • NeurIPS 2019 • Guanghui Lan, Zhize Li, Yi Zhou

Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.

no code implementations • 9 Oct 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization. In this paper, we apply the momentum scheme to cubic regularized (CR) Newton's method and explore the potential for acceleration.

1 code implementation • 1 Oct 2018 • Qi Deng, Yi Cheng, Guanghui Lan

More specifically, we show that diagonal scaling, initially designed to improve vanilla stochastic gradient, can be incorporated into accelerated stochastic gradient descent to achieve the optimal rate of convergence for smooth stochastic optimization.

no code implementations • 27 Sep 2018 • Digvijay Boob, Santanu S. Dey, Guanghui Lan

In this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function.

no code implementations • 24 Sep 2018 • Guanghui Lan, Yi Zhou

In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks.

no code implementations • 22 Aug 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

This note considers the inexact cubic-regularized Newton's method (CR), which has been shown in \cite{Cartis2011a} to achieve the same order-level convergence rate to a secondary stationary point as the exact CR \citep{Nesterov2006}.

no code implementations • 20 Feb 2018 • Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

Cubic regularization (CR) is an optimization method with emerging popularity due to its capability to escape saddle points and converge to second-order stationary solutions for nonconvex optimization.

no code implementations • ICLR 2018 • Digvijay Boob, Guanghui Lan

We essentially show that these non-singular hidden layer matrix satisfy a ``"good" property for these big class of activation functions.

no code implementations • 15 Nov 2017 • Guanghui Lan, Yi Zhou

Furthermore, we demonstrate that for stochastic finite-sum optimization problems, RGEM maintains the optimal ${\cal O}(1/\epsilon)$ complexity (up to a certain logarithmic factor) in terms of the number of stochastic gradient computations, but attains an ${\cal O}(\log(1/\epsilon))$ complexity in terms of communication rounds (each round involves only one agent).

no code implementations • 30 Oct 2017 • Digvijay Boob, Guanghui Lan

We look at this problem in the setting where the number of parameters is greater than the number of sampled points.

no code implementations • 11 Jul 2017 • Guanghui Lan, Zhiqiang Zhou

We show that DSA can achieve an optimal ${\cal O}(1/\epsilon^4)$ rate of convergence in terms of the total number of required scenarios when applied to a three-stage stochastic optimization problem.

no code implementations • ICML 2017 • Guanghui Lan, Sebastian Pokutta, Yi Zhou, Daniel Zink

In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.

no code implementations • 14 Jan 2017 • Guanghui Lan, Soomin Lee, Yi Zhou

Our major contribution is to present a new class of decentralized primal-dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions.

no code implementations • 13 Apr 2016 • Guanghui Lan, Zhiqiang Zhou

We then present a variant of CSA, namely the cooperative stochastic parameter approximation (CSPA) algorithm, to deal with the situation when the constraint is defined over problem parameters and show that it exhibits similar optimal rate of convergence to CSA.

no code implementations • 29 Aug 2015 • Saeed Ghadimi, Guanghui Lan, Hongchao Zhang

In a similar vein, we show that some well-studied techniques for nonlinear programming, e. g., Quasi-Newton iteration, can be embedded into optimal convex optimization algorithms to possibly further enhance their numerical performance.

no code implementations • 8 Jul 2015 • Guanghui Lan, Yi Zhou

We first introduce a deterministic primal-dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal-dual termination criterion.

1 code implementation • 4 Jun 2014 • Guanghui Lan

We consider in this paper a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term.

1 code implementation • 14 Oct 2013 • Saeed Ghadimi, Guanghui Lan

We demonstrate that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general nonconvex smooth optimization problems by using first-order information, similarly to the gradient descent method.

Optimization and Control

no code implementations • 22 Sep 2013 • Saeed Ghadimi, Guanghui Lan

In this paper, we introduce a new stochastic approximation (SA) type algorithm, namely the randomized stochastic gradient (RSG) method, for solving an important class of nonlinear (possibly nonconvex) stochastic programming (SP) problems.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.