no code implementations • 10 Jul 2024 • Zachary Charles, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Nicole Mitchell, Krishna Pillutla, Keith Rush
We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user.
1 code implementation • 11 Mar 2024 • Keith Rush, Zachary Charles, Zachary Garrett, Sean Augenstein, Nicole Mitchell
We present DrJAX, a JAX-based library designed to support large-scale distributed and parallel machine learning algorithms that use MapReduce-style operations.
no code implementations • 17 Nov 2023 • Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite
Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization.
no code implementations • NeurIPS 2023 • Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan Mcmahan
We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise.
no code implementations • 18 Jan 2023 • Keith Rush, Zachary Charles, Zachary Garrett
We propose a federated automatic differentiation (FAD) framework that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology.
no code implementations • 19 Aug 2022 • Zachary Charles, Kallista Bonawitz, Stanislav Chiknavaryan, Brendan Mcmahan, Blaise Agüera y Arcas
In order to make this practical, we outline a primitive, federated select, which enables client-specific selection in realistic FL systems.
2 code implementations • 18 Jun 2022 • Shanshan Wu, Tian Li, Zachary Charles, Yu Xiao, Ziyu Liu, Zheng Xu, Virginia Smith
To better answer these questions, we propose Motley, a benchmark for personalized federated learning.
1 code implementation • 7 Jan 2022 • Nicole Mitchell, Johannes Ballé, Zachary Charles, Jakub Konečný
A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server.
no code implementations • 8 Sep 2021 • Zachary Charles, Keith Rush
In the context of federated learning, we show that when clients have loss functions whose gradients satisfy this condition, federated averaging is equivalent to gradient descent on a surrogate loss function.
2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.
no code implementations • NeurIPS 2021 • Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, Virginia Smith
Our work highlights a number of challenges stemming from the use of larger cohorts.
no code implementations • 4 Jun 2021 • Jianyu Wang, Zheng Xu, Zachary Garrett, Zachary Charles, Luyang Liu, Gauri Joshi
Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server.
no code implementations • 8 Mar 2021 • Zachary Charles, Jakub Konečný
Using these insights, we are able to compare local update methods based on their convergence/accuracy trade-off, not just their convergence to critical points of the empirical loss.
1 code implementation • 2 Jul 2020 • Zachary Charles, Jakub Konečný
We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms.
6 code implementations • ICLR 2021 • Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.
9 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao
FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.
1 code implementation • NeurIPS 2019 • Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation.
no code implementations • 22 May 2019 • Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos
Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.
no code implementations • 8 May 2019 • Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos
Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness.
1 code implementation • 28 Jan 2019 • Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.
no code implementations • 8 Nov 2018 • Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos
We show that these "transferable adversarial directions" are guaranteed to exist for linear separators of a given set, and will exist with high probability for linear classifiers trained on independent sets drawn from the same distribution.
1 code implementation • NeurIPS 2018 • Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos
We present ATOMO, a general framework for atomic sparsification of stochastic gradients.
no code implementations • 25 May 2018 • Zachary Charles, Dimitris Papailiopoulos
Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning.
1 code implementation • ICML 2018 • Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i. e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS).
no code implementations • 17 Nov 2017 • Zachary Charles, Dimitris Papailiopoulos, Jordan Ellenberg
Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time.
no code implementations • ICML 2018 • Zachary Charles, Dimitris Papailiopoulos
Finally, we show that although our results imply comparable stability for SGD and GD in the PL setting, there exist simple neural networks with multiple local minima where SGD is stable but GD is not.
no code implementations • 8 Jul 2017 • Zachary Charles, Amin Jalali, Rebecca Willett
Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces.