no code implementations • 8 Nov 2023 • Feihu Huang
In the paper, we propose a class of efficient adaptive bilevel methods based on mirror descent for nonconvex bilevel optimization, where its upper-level problem is nonconvex possibly with nonsmooth regularization, and its lower-level problem is also nonconvex while satisfies Polyak-{\L}ojasiewicz (PL) condition.
no code implementations • 21 Apr 2023 • Feihu Huang, Songcan Chen
Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization.
no code implementations • 7 Mar 2023 • Feihu Huang
To fill this gap, in the paper, we study a class of nonconvex bilevel optimization problems, where both upper-level and lower-level problems are nonconvex, and the lower-level problem satisfies Polyak-{\L}ojasiewicz (PL) condition.
no code implementations • 7 Mar 2023 • Feihu Huang
In the paper, we study a class of nonconvex nonconcave minimax optimization problems (i. e., $\min_x\max_y f(x, y)$), where $f(x, y)$ is possible nonconvex in $x$, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition in $y$.
no code implementations • 13 Feb 2023 • Junyi Li, Feihu Huang, Heng Huang
This matches the best known rate for first-order FL algorithms and \textbf{FedDA-MVR} is the first adaptive FL algorithm that achieves this rate.
no code implementations • 13 Feb 2023 • Junyi Li, Feihu Huang, Heng Huang
In this work, we investigate Federated Bilevel Optimization problems and propose a communication-efficient algorithm, named FedBiOAcc.
no code implementations • ICCV 2023 • Shangqian Gao, Zeyu Zhang, yanfu Zhang, Feihu Huang, Heng Huang
To mitigate this gap, we first learn a target sub-network during the model training process, and then we use this sub-network to guide the learning of model weights through partial regularization.
no code implementations • 2 Dec 2022 • Xidong Wu, Feihu Huang, Zhengmian Hu, Heng Huang
Federated learning has attracted increasing attention with the emergence of distributed data.
no code implementations • 14 Nov 2022 • Feihu Huang, Xinrui Wang, Junyi Li, Songcan Chen
To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i. e., AdaFGDA) to solve these distributed minimax problems.
no code implementations • 3 Nov 2022 • Feihu Huang
setting, and prove our algorithms obtain a lower sample and communication complexities simultaneously than the existing federated compositional algorithms.
no code implementations • 2 Nov 2022 • Feihu Huang
In the paper, thus, we propose a novel adaptive federated bilevel optimization algorithm (i. e., AdaFBiO) to solve the distributed bilevel optimization problems, where the objective function of Upper-Level (UL) problem is possibly nonconvex, and that of Lower-Level (LL) problem is strongly convex.
no code implementations • 14 Oct 2022 • Wenhan Xian, Feihu Huang, Heng Huang
In our theoretical analysis, we prove that our new algorithm achieves a fast convergence rate of $O(\frac{1}{\sqrt{nT}} + \frac{1}{(k/d)^2 T})$ with the communication cost of $O(k \log(d))$ at each iteration.
1 code implementation • 12 Jul 2022 • Julong Young, Junhui Chen, Feihu Huang, Jian Peng
This, for fine-grained time series, leads to a bottleneck in information input and prediction output, which is mortal to long-term series forecasting.
no code implementations • 3 May 2022 • Junyi Li, Feihu Huang, Heng Huang
Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm and we show it requires $O(\epsilon^{-2})$ number of iterations to reach an $\epsilon$-stationary point.
no code implementations • NeurIPS 2021 • Zhengmian Hu, Feihu Huang, Heng Huang
In the paper, we study the underdamped Langevin diffusion (ULD) with strongly-convex potential consisting of finite summation of $N$ smooth components, and propose an efficient discretization method, which requires $O(N+d^\frac{1}{3}N^\frac{2}{3}/\varepsilon^\frac{2}{3})$ gradient evaluations to achieve $\varepsilon$-error (in $\sqrt{\mathbb{E}{\lVert{\cdot}\rVert_2^2}}$ distance) for approximating $d$-dimensional ULD.
no code implementations • NeurIPS 2021 • Feihu Huang, Xidong Wu, Heng Huang
For our stochastic algorithms, we first prove that the mini-batch stochastic mirror descent ascent (SMDA) method obtains a sample complexity of $O(\kappa^3\epsilon^{-4})$ for finding an $\epsilon$-stationary point, where $\kappa$ denotes the condition number.
no code implementations • NeurIPS 2021 • Wenhan Xian, Feihu Huang, yanfu Zhang, Heng Huang
We prove that our DM-HSGD algorithm achieves stochastic first-order oracle (SFO) complexity of $O(\kappa^3 \epsilon^{-3})$ for decentralized stochastic nonconvex-strongly-concave problem to search an $\epsilon$-stationary point, which improves the exiting best theoretical results.
no code implementations • 26 Jul 2021 • Feihu Huang, Junyi Li, Shangqian Gao, Heng Huang
Specifically, we propose a bilevel optimization method based on Bregman distance (BiO-BreD) to solve deterministic bilevel problems, which achieves a lower computational complexity than the best known results.
no code implementations • 30 Jun 2021 • Feihu Huang, Xidong Wu, Zhengmian Hu
Specifically, we propose a fast Adaptive Gradient Descent Ascent (AdaGDA) method based on the basic momentum technique, which reaches a lower gradient complexity of $\tilde{O}(\kappa^4\epsilon^{-4})$ for finding an $\epsilon$-stationary point without large batches, which improves the existing results of the adaptive GDA methods by a factor of $O(\sqrt{\kappa})$.
1 code implementation • ICLR 2022 • Feihu Huang, Shangqian Gao, Heng Huang
In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques.
no code implementations • 21 Jun 2021 • Feihu Huang, Junyi Li, Shangqian Gao
To fill this gap, in the paper, we propose a novel fast adaptive bilevel framework to solve stochastic bilevel optimization problems that the outer problem is possibly nonconvex and the inner problem is strongly convex.
no code implementations • 21 Jun 2021 • Feihu Huang, Junyi Li
In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML).
1 code implementation • CVPR 2021 • Shangqian Gao, Feihu Huang, Weidong Cai, Heng Huang
Specifically, we train a stand-alone neural network to predict sub-networks' performance and then maximize the output of the network as a proxy of accuracy to guide pruning.
1 code implementation • NeurIPS 2021 • Feihu Huang, Junyi Li, Heng Huang
To fill this gap, we propose a faster and universal framework of adaptive gradients (i. e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms.
no code implementations • 9 Feb 2021 • Zhengmian Hu, Feihu Huang, Heng Huang
Moreover, our HMC methods with biased gradient estimators, such as SARAH and SARGE, require $\tilde{O}(N+\sqrt{N} \kappa^2 d^{\frac{1}{2}} \varepsilon^{-1})$ gradient complexity, which has the same dependency on condition number $\kappa$ and dimension $d$ as full gradient method, but improves the dependency of sample size $N$ for a factor of $N^\frac{1}{2}$.
no code implementations • 1 Jan 2021 • Shangqian Gao, Feihu Huang, Heng Huang
In this paper, we propose a novel channel pruning method to solve the problem of compression and acceleration of Convolutional Neural Networks (CNNs).
no code implementations • 13 Oct 2020 • Feihu Huang, Shangqian Gao
At the same time, we present an effective Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the stochastic minimax optimization, which has a sample complexity of $O(\kappa^4\epsilon^{-4})$ for finding an $\epsilon$-stationary solution.
no code implementations • 18 Aug 2020 • Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
Our Acc-MDA achieves a low gradient complexity of $\tilde{O}(\kappa_y^{4. 5}\epsilon^{-3})$ without requiring large batches for finding an $\epsilon$-stationary point.
no code implementations • 4 Aug 2020 • Feihu Huang, Songcan Chen, Heng Huang
Our theoretical analysis shows that the online SPIDER-ADMM has the IFO complexity of $\mathcal{O}(\epsilon^{-\frac{3}{2}})$, which improves the existing best results by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$.
1 code implementation • ICML 2020 • Feihu Huang, Lue Tao, Songcan Chen
To relax the large batches required in the Acc-SZOFW, we further propose a novel accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW*) based on a new variance reduced technique of STORM, which still reaches the function query complexity of $O(d\epsilon^{-3})$ in the stochastic problem without relying on any large batches.
1 code implementation • ICML 2020 • Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
In particular, we present a non-adaptive version of IS-MBPG method, i. e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches.
no code implementations • 30 Jul 2019 • Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
Zeroth-order (a. k. a, derivative-free) methods are a class of effective optimization methods for solving complex machine learning problems, where gradients of the objective functions are not available or computationally prohibitive.
no code implementations • 29 May 2019 • Feihu Huang, Shangqian Gao, Songcan Chen, Heng Huang
In particular, our methods not only reach the best convergence rate $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints.
no code implementations • 16 Feb 2019 • Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang
Proximal gradient method has been playing an important role to solve many machine learning tasks, especially for the nonsmooth problems.
no code implementations • 8 Feb 2018 • Feihu Huang, Songcan Chen
Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of $O(1/T)$ without condition on the mini-batch size.
no code implementations • 26 Apr 2017 • Feihu Huang, Songcan Chen
To the best of our knowledge, it is first proved that the accelerated SGD method converges linearly to the local minimum of the nonconvex optimization.
no code implementations • 10 Oct 2016 • Feihu Huang, Songcan Chen, Zhaosong Lu
Specifically, the first class called the nonconvex stochastic variance reduced gradient ADMM (SVRG-ADMM), uses a multi-stage scheme to progressively reduce the variance of stochastic gradients.