Personalization addresses this issue by enabling each client to have a different model tailored to their own data while simultaneously benefiting from the other clients' data.

We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses.

Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers.

Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments.

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data.

It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable.

The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}.

The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges.

We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches.

SABLE leverages HTS, a novel and efficient homomorphic operator implementing the prominent coordinate-wise trimmed mean robust aggregator.

In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs.

We then leverage this definition to show that a general class of gradient-free ML algorithms - ($1,\lambda$)-Evolutionary Search - can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms.

This prominence amplified prior concerns regarding the misuse of LLMs and led to the emergence of numerous tools to detect LLMs in the wild.

The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.

Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines.

Storage disaggregation underlies today's cloud and is naturally complemented by pushing down some computation to storage, thus mitigating the potential network bottleneck between the storage and compute tiers.

Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance.

We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight.

We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions.

More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic).

Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning.

Be it in natural language generation or in the image generation, massive performances gains have been achieved in the last years.

We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality.

This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML).

We propose a practical method which, despite increasing the variance, reduces the variance-norm ratio, mitigating the identified weakness.

We propose Mu, a system that takes less than 1. 3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%.

We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e. g., TensorFlow and PyTorch, Byzantine-resilient.

Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case.

We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data.

Federated Learning (FL) is very appealing for its privacy benefits: essentially, a global model is trained with updates computed on mobile devices while keeping the data of users local.

In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private.

Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other.

Momentum is a variant of gradient descent that has been proposed for its benefits on convergence.

We moreover show that the throughput gain of LiuBei compared to another state-of-the-art Byzantine-resilient ML algorithm (that assumes network asynchrony) is 70%.

As stated in the original paper by Nakamoto, at the heart of these systems lies the problem of preventing double-spending; this is usually solved by achieving consensus on the order of transfers among the participants.

This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions.

Given $n$ workers, $f$ of which are arbitrary malicious (Byzantine) and $m=n-f$ are not, we prove that multi-Bulyan can ensure a strong form of Byzantine resilience, as well as an ${\frac{m}{n}}$ slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning.

The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers.

We study fault tolerance of neural networks subject to small random neuron/weight crash failures in a probabilistic setting.

no code implementations • 7 Jun 2018 • El Mahdi El Mhamdi, Rachid Guerraoui, Lê Nguyên Hoang, Alexandre Maurer

We first solve the problem analytically in the case of two populations, with a uniform bonus-malus on the zones where each population is a majority.

We show that when a third party, the adversary, steps into the two-party setting (agent and operator) of safely interruptible reinforcement learning, a trade-off has to be made between the probability of following the optimal policy in the limit, and the probability of escaping a dangerous situation created by the adversary.

Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR-10 and MNIST.

The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme.

A standard belief on emerging collective behavior is that it emerges from simple individual rules.

no code implementations • 31 Jan 2018 • Lê Nguyên Hoang, Rachid Guerraoui

Deep learning relies on a very specific kind of neural networks: those superposing several neural layers.

We propose \emph{Krum}, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD.

In this paper, we introduce the notion of consumed item pack (CIP) which enables to link users (or items) based on their implicit analogous consumption behavior.

This bound involves dependencies on the network parameters that can be seen as being too pessimistic in the average case.

We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase.

The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements.

We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners.

The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations.

