no code implementations • 4 Apr 2024 • Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban
Language Models (LMs) have shown promising performance in natural language generation.
no code implementations • 5 Feb 2024 • Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan
In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms.
no code implementations • 16 Aug 2023 • Xinmeng Huang, Ping Li, Xiaoyun Li
The existing approaches either cannot accommodate arbitrary data heterogeneity or partial participation, or require stringent conditions on compression.
no code implementations • 28 Jun 2023 • Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan
When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate.
no code implementations • 9 Jun 2023 • Xinmeng Huang, Kan Xu, Donghwan Lee, Hamed Hassani, Hamsa Bastani, Edgar Dobriban
MOLAR improves the dependence of the estimation error on the data dimension, compared to independent least squares estimates.
no code implementations • NeurIPS 2023 • Yutong He, Xinmeng Huang, Kun Yuan
Our results reveal that using independent unbiased compression can reduce the total communication cost by a factor of up to $\Theta(\sqrt{\min\{n, \kappa\}})$ when all local smoothness constants are constrained by a common upper bound, where $n$ is the number of workers and $\kappa$ is the condition number of the functions being minimized.
no code implementations • 12 May 2023 • Yutong He, Xinmeng Huang, Yiming Chen, Wotao Yin, Kun Yuan
In this paper, we investigate the performance limit of distributed stochastic optimization algorithms employing communication compression.
1 code implementation • 31 Jan 2023 • Donghwan Lee, Behrad Moniri, Xinmeng Huang, Edgar Dobriban, Hamed Hassani
Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain.
no code implementations • 1 Nov 2022 • Xinmeng Huang, Kun Yuan
The main difficulties lie in how to gauge the effectiveness when transmitting messages between two nodes via time-varying communications, and how to establish the lower bound when the network size is fixed (which is a prerequisite in stochastic optimization).
no code implementations • 14 Oct 2022 • Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan
While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear.
no code implementations • 8 Jun 2022 • Xinmeng Huang, Yiming Chen, Wotao Yin, Kun Yuan
We establish a convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection.
no code implementations • 1 Jun 2022 • Xinmeng Huang, Donghwan Lee, Edgar Dobriban, Hamed Hassani
In modern machine learning, users often have to collaborate to learn the distribution of the data.
1 code implementation • 3 Mar 2022 • Donghwan Lee, Xinmeng Huang, Hamed Hassani, Edgar Dobriban
We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
no code implementations • NeurIPS 2021 • Xinmeng Huang, Kun Yuan, Xianghui Mao, Wotao Yin
In this paper, we will improve the convergence analysis and rates of variance reduction under without-replacement sampling orders for composite finite-sum minimization. Our results are in two-folds.
no code implementations • 17 May 2021 • Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang
For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of ${\Omega}(n/(1-\beta)^2)$ and $\Omega(n^3/(1-\beta)^4)$ for strongly and generally convex cost functions, respectively, where $1-\beta \in (0, 1)$ is a topology-dependent quantity that approaches $0$ for a large and sparse network.
no code implementations • 25 Apr 2021 • Xinmeng Huang, Kun Yuan, Xianghui Mao, Wotao Yin
In the highly data-heterogeneous scenario, Prox-DFinito with optimal cyclic sampling can attain a sample-size-independent convergence rate, which, to our knowledge, is the first result that can match with uniform-iid-sampling with variance reduction.
1 code implementation • ICCV 2021 • Kun Yuan, Yiming Chen, Xinmeng Huang, Yingya Zhang, Pan Pan, Yinghui Xu, Wotao Yin
Experimental results on a variety of computer vision tasks and models demonstrate that DecentLaM promises both efficient and high-quality training.