Search Results for author: Zachary Charles

Found 28 papers, 12 papers with code

Fine-Tuning Large Language Models with User-Level Differential Privacy

no code implementations10 Jul 2024 Zachary Charles, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Nicole Mitchell, Krishna Pillutla, Keith Rush

We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user.

DrJAX: Scalable and Differentiable MapReduce Primitives in JAX

1 code implementation11 Mar 2024 Keith Rush, Zachary Charles, Zachary Garrett, Sean Augenstein, Nicole Mitchell

We present DrJAX, a JAX-based library designed to support large-scale distributed and parallel machine learning algorithms that use MapReduce-style operations.

Federated Learning

Leveraging Function Space Aggregation for Federated Learning at Scale

no code implementations17 Nov 2023 Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite

Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization.

Distributed Optimization Federated Learning

Federated Automatic Differentiation

no code implementations18 Jan 2023 Keith Rush, Zachary Charles, Zachary Garrett

We propose a federated automatic differentiation (FAD) framework that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology.

FAD Federated Learning +1

Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

1 code implementation7 Jan 2022 Nicole Mitchell, Johannes Ballé, Zachary Charles, Jakub Konečný

A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server.

Federated Learning Quantization

Iterated Vector Fields and Conservatism, with Applications to Federated Learning

no code implementations8 Sep 2021 Zachary Charles, Keith Rush

In the context of federated learning, we show that when clients have loss functions whose gradients satisfy this condition, federated averaging is equivalent to gradient descent on a surrogate loss function.

Federated Learning

Local Adaptivity in Federated Learning: Convergence and Consistency

no code implementations4 Jun 2021 Jianyu Wang, Zheng Xu, Zachary Garrett, Zachary Charles, Luyang Liu, Gauri Joshi

Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server.

Federated Learning

Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning

no code implementations8 Mar 2021 Zachary Charles, Jakub Konečný

Using these insights, we are able to compare local update methods based on their convergence/accuracy trade-off, not just their convergence to critical points of the empirical loss.

Federated Learning Meta-Learning

On the Outsized Importance of Learning Rates in Local Update Methods

1 code implementation2 Jul 2020 Zachary Charles, Jakub Konečný

We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms.

Federated Learning Meta-Learning

Adaptive Federated Optimization

6 code implementations ICLR 2021 Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.

Federated Learning

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

1 code implementation NeurIPS 2019 Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation.

Convergence and Margin of Adversarial Training on Separable Data

no code implementations22 May 2019 Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.

Does Data Augmentation Lead to Positive Margin?

no code implementations8 May 2019 Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness.

Data Augmentation

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

1 code implementation28 Jan 2019 Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.

A Geometric Perspective on the Transferability of Adversarial Directions

no code implementations8 Nov 2018 Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos

We show that these "transferable adversarial directions" are guaranteed to exist for linear separators of a given set, and will exist with high probability for linear classifiers trained on independent sets drawn from the same distribution.

Gradient Coding via the Stochastic Block Model

no code implementations25 May 2018 Zachary Charles, Dimitris Papailiopoulos

Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning.

Stochastic Block Model

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

1 code implementation ICML 2018 Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i. e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS).

Approximate Gradient Coding via Sparse Random Graphs

no code implementations17 Nov 2017 Zachary Charles, Dimitris Papailiopoulos, Jordan Ellenberg

Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time.

Stability and Generalization of Learning Algorithms that Converge to Global Optima

no code implementations ICML 2018 Zachary Charles, Dimitris Papailiopoulos

Finally, we show that although our results imply comparable stability for SGD and GD in the PL setting, there exist simple neural networks with multiple local minima where SGD is stable but GD is not.

Generalization Bounds

Subspace Clustering with Missing and Corrupted Data

no code implementations8 Jul 2017 Zachary Charles, Amin Jalali, Rebecca Willett

Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.