A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous training to mitigate stragglers and to also ensure privacy simultaneously.
While this learning-based approach is more resource-efficient than replication, it is tailored to the specific model hosted by the cloud and is particularly suitable for a small number of queries (typically less than four) and tolerating very few (mostly one) number of stragglers.
As our main contribution, we propose Basil, a fast and computationally efficient Byzantine robust algorithm for decentralized training systems, which leverages a novel sequential, memory assisted and performance-based criteria for training over a logical ring while filtering the Byzantine users.
We further propose folded Lagrange coded computing (FLCC) to incorporate the developed techniques into a specific coded computing setting.
In this work, we empirically show that the square function is not the best degree-$2$ polynomial that can replace the ReLU function even when restricting the polynomials to have integer coefficients.
We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while keeping their individual datasets private from the other parties.
The state-of-the-art protocols for secure model aggregation, which are based on additive masking, require all users to quantize their model updates to the same level of quantization.
Information Theory Systems and Control Systems and Control Information Theory
This presents a major challenge for the resilience of the model against adversarial (Byzantine) users, who can manipulate the global model by modifying their local models or datasets.
Federated Learning (FL) is an exciting new paradigm that enables training a global model from data generated locally at the client nodes, without moving client data to a centralized server.
In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data.
A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users.
How to train a machine learning model while keeping the data private and secure?
Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost.
In particular, we propose PolyShard: ``polynomially coded sharding'' scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system.
Cryptography and Security Distributed, Parallel, and Cluster Computing Information Theory Information Theory
In particular, PCR requires a recovery threshold that scales inversely proportionally with the amount of computation/storage available at each worker.
We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy), and straggler toleration in this framework.
We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs.
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices.
Information Theory Distributed, Parallel, and Cluster Computing Information Theory
We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce.
Distributed, Parallel, and Cluster Computing Information Theory Information Theory
A new sampling algorithm is proposed, which sequentially selects the graph nodes to be sampled, based on an aggressive search for the boundary of the signal over the graph.
Graph-based methods play an important role in unsupervised and semi-supervised learning tasks by taking into account the underlying geometry of the data set.