Search Results for author: Zhize Li

Found 30 papers, 7 papers with code

Acceleration for Compressed Gradient Descent in Distributed Optimization

no code implementations ICML 2020 Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtarik

Due to the high communication cost in distributed and federated learning problems, methods relying on sparsification or quantization of communicated messages are becoming increasingly popular.

Distributed Optimization Federated Learning +1

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression

no code implementations29 Oct 2023 Sijin Chen, Zhize Li, Yuejie Chi

To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions.

Federated Learning

Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering

1 code implementation26 Oct 2022 Lingxiao Huang, Zhize Li, Jialin Sun, Haoyu Zhao

Vertical federated learning (VFL), where data features are stored in multiple parties distributively, is an important area in machine learning.

Clustering Federated Learning +1

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

no code implementations22 Aug 2022 Zhize Li, Jian Li

We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b).

SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

1 code implementation20 Jun 2022 Zhize Li, Haoyu Zhao, Boyue Li, Yuejie Chi

We then propose a unified framework SoteriaFL for private federated learning, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme.

Federated Learning Privacy Preserving

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

no code implementations2 Feb 2022 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

1 code implementation31 Jan 2022 Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

no code implementations24 Dec 2021 Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik

In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round.

Federated Learning

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

no code implementations7 Oct 2021 Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

1 code implementation4 Oct 2021 Boyue Li, Zhize Li, Yuejie Chi

Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication.

Federated Learning

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computations

no code implementations29 Sep 2021 Zhize Li, Slavomir Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

no code implementations10 Aug 2021 Haoyu Zhao, Zhize Li, Peter Richtárik

We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.

Federated Learning

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

no code implementations NeurIPS 2021 Zhize Li, Peter Richtárik

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.

Distributed Optimization Federated Learning

A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization

no code implementations17 Jun 2021 Zhize Li

In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21).

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

no code implementations21 Mar 2021 Zhize Li

ii) For strongly convex finite-sum problems, we also show that ANITA can achieve the optimal convergence rate $O\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ matching the lower bound $\Omega\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ provided by Lan and Zhou (2015).

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

no code implementations2 Mar 2021 Zhize Li, Slavomír Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

MARINA: Faster Non-Convex Distributed Learning with Compression

1 code implementation15 Feb 2021 Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik

Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.

Federated Learning

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

no code implementations25 Aug 2020 Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik

Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

no code implementations12 Jun 2020 Zhize Li, Peter Richtárik

We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

no code implementations26 Feb 2020 Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.

Federated Learning

A unified variance-reduced accelerated gradient method for convex optimization

no code implementations NeurIPS 2019 Guanghui Lan, Zhize Li, Yi Zhou

Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

no code implementations1 May 2019 Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.

SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

no code implementations NeurIPS 2019 Zhize Li

We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e. g., Neon2 [Allen-Zhu and Li, 2018]).

Learning Two-layer Neural Networks with Symmetric Inputs

no code implementations ICLR 2019 Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang

We give a new algorithm for learning a two-layer neural network under a general class of input distributions.

Vocal Bursts Valence Prediction

A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

no code implementations7 Sep 2018 Zhize Li, Jian Li

Besides, if the hyperparameters (e. g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate.

Stochastic Gradient Hamiltonian Monte Carlo with Variance Reduction for Bayesian Inference

no code implementations29 Mar 2018 Zhize Li, Tianyi Zhang, Shuyu Cheng, Jun Zhu, Jian Li

In this paper, we apply the variance reduction tricks on Hamiltonian Monte Carlo and achieve better theoretical convergence results compared with the variance-reduced Langevin dynamics.

Bayesian Inference

Gradient Boosting With Piece-Wise Linear Regression Trees

1 code implementation15 Feb 2018 Yu Shi, Jian Li, Zhize Li

We show that PL Trees can accelerate convergence of GBDT and improve the accuracy.

Ensemble Learning regression

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

no code implementations NeurIPS 2018 Zhize Li, Jian Li

In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case.

Optimal In-Place Suffix Sorting

2 code implementations26 Oct 2016 Zhize Li, Jian Li, Hongwei Huo

The open problem asked to design in-place algorithms in $o(n\log n)$ time and ultimately, in $O(n)$ time for (read-only) integer alphabets with $|\Sigma| \leq n$.

Data Structures and Algorithms

On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs

no code implementations NeurIPS 2015 Wei Cao, Jian Li, Yufei Tao, Zhize Li

This paper discusses how to efficiently choose from $n$ unknowndistributions the $k$ ones whose means are the greatest by a certainmetric, up to a small relative error.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.