Search Results for author: Mher Safaryan

Found 12 papers, 2 papers with code

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

no code implementations • 31 Oct 2023 • Rustem Islamov, Mher Safaryan, Dan Alistarh

As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD.

Paper
Add Code

Knowledge Distillation Performs Partial Variance Reduction

1 code implementation • NeurIPS 2023 • Mher Safaryan, Alexandra Peste, Dan Alistarh

We show that, in the context of linear and deep linear models, KD can be interpreted as a novel type of stochastic variance reduction mechanism.

Knowledge Distillation

Paper
Code

GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

1 code implementation • 28 Oct 2022 • Artavazd Maranjyan, Mher Safaryan, Peter Richtárik

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.

Common Sense Reasoning Distributed Optimization

Paper
Code

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

no code implementations • 7 Jun 2022 • Rustem Islamov, Xun Qian, Slavomír Hanzely, Mher Safaryan, Peter Richtárik

Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems.

Federated Learning Vocal Bursts Type Prediction

Paper
Add Code

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

no code implementations • 2 Nov 2021 • Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik

Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.

Distributed Optimization Federated Learning +1

Paper
Add Code

Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques

no code implementations • 7 Jun 2021 • Bokun Wang, Mher Safaryan, Peter Richtárik

To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them.

BIG-bench Machine Learning Distributed Optimization +1

Paper
Add Code

FedNL: Making Newton-Type Methods Applicable to Federated Learning

no code implementations • 5 Jun 2021 • Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik

In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.

Federated Learning Model Compression +2

Paper
Add Code

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

no code implementations • NeurIPS 2021 • Mher Safaryan, Filip Hanzely, Peter Richtárik

In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.

Distributed Optimization

Paper
Add Code

Optimal Gradient Compression for Distributed and Federated Learning

no code implementations • 7 Oct 2020 • Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik

In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound.

Federated Learning Quantization

Paper
Add Code

On Biased Compression for Distributed Learning

no code implementations • 27 Feb 2020 • Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning.

Paper
Add Code

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

no code implementations • 20 Feb 2020 • Mher Safaryan, Egor Shulgin, Peter Richtárik

In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds.

Federated Learning Quantization

Paper
Add Code

On Stochastic Sign Descent Methods

no code implementations • 25 Sep 2019 • Mher Safaryan, Peter Richtárik

Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.