no code implementations • 8 Mar 2024 • Naman Agarwal, Pranjal Awasthi, Satyen Kale, Eric Zhao
Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks.
no code implementations • 8 Feb 2024 • Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar
We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork (e. g. depth-wise, width-wise) of the network at each step, progressively increasing the size in stages.
1 code implementation • 17 Jan 2024 • Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato
Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.
no code implementations • 15 Dec 2023 • Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta
We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO).
no code implementations • 6 Feb 2023 • Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang
Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL).
no code implementations • 13 Oct 2022 • Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan
When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart).
no code implementations • 6 Jul 2022 • Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi
Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$.
no code implementations • 21 Jun 2022 • Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi
Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.
no code implementations • 9 Jun 2022 • Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang
Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg.
1 code implementation • 2 Jun 2022 • Zebang Shen, Zhenfu Wang, Satyen Kale, Alejandro Ribeiro, Amin Karbasi, Hamed Hassani
In this paper, we exploit this concept to design a potential function of the hypothesis velocity fields, and prove that, if such a function diminishes to zero during the training procedure, the trajectory of the densities generated by the hypothesis velocity fields converges to the solution of the FPE in the Wasserstein-2 sense.
no code implementations • 26 May 2022 • Sean Augenstein, Andrew Hard, Lin Ning, Karan Singhal, Satyen Kale, Kurt Partridge, Rajiv Mathews
For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution.
no code implementations • 9 Feb 2022 • Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir
We initiate a formal study of reproducibility in optimization.
no code implementations • 6 Feb 2022 • Julian Zimmert, Naman Agarwal, Satyen Kale
This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.
no code implementations • 31 Jan 2022 • Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp
In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).
no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
no code implementations • 6 Oct 2021 • Naman Agarwal, Satyen Kale, Julian Zimmert
Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.
2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.
no code implementations • NeurIPS 2021 • Satyen Kale, Ayush Sekhari, Karthik Sridharan
We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$.
no code implementations • 11 Mar 2021 • Zebang Shen, Hamed Hassani, Satyen Kale, Amin Karbasi
First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated functional gradient boosting (FFGB) method that provably converges to the global minimum.
no code implementations • 1 Mar 2021 • Jacob Abernethy, Pranjal Awasthi, Satyen Kale
This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities.
no code implementations • NeurIPS 2021 • Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh
We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.
no code implementations • NeurIPS 2020 • Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri
We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors.
1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
4 code implementations • NeurIPS 2020 • Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale
We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model.
no code implementations • 4 Feb 2020 • Naman Agarwal, Pranjal Awasthi, Satyen Kale
We study the role of depth in training randomly initialized overparameterized neural networks.
no code implementations • NeurIPS 2019 • Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar
In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.
7 code implementations • ICML 2020 • Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh
We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.
3 code implementations • ICLR 2018 • Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.
no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.
no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra
Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.
1 code implementation • NeurIPS 2018 • Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar
In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.
no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar
Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.
no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.
no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.
no code implementations • NeurIPS 2018 • Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak
Even in the "non-realizable" setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that do significantly worse than the best possible states at most $\operatorname{O}\!\left(\sqrt {Tn}\right) $ times on the first $T$ measurements.
no code implementations • NeurIPS 2017 • Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan
We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.
no code implementations • ICML 2017 • Satyen Kale, Zohar Karnin, Tengyuan Liang, Dávid Pál
Online sparse linear regression is an online problem where an algorithm repeatedly chooses a subset of coordinates to observe in an adversarially chosen feature vector, makes a real-valued prediction, receives the true label, and incurs the squared loss.
no code implementations • 7 Mar 2016 • Dean Foster, Satyen Kale, Howard Karloff
We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss.
no code implementations • NeurIPS 2016 • Satyen Kale, Chansoo Lee, Dávid Pál
We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round.
no code implementations • NeurIPS 2015 • Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo
We extend the theory of boosting for regression problems to the online learning setting.
no code implementations • 9 Feb 2015 • Alina Beygelzimer, Satyen Kale, Haipeng Luo
We study online boosting, the task of converting any weak online learner into a strong online learner.
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • NeurIPS 2013 • Jacob Abernethy, Satyen Kale
We consider the design of strategies for \emph{market making} in a market like a stock, commodity, or currency exchange.
no code implementations • 19 Jun 2013 • Satyen Kale
We solve the COLT 2013 open problem of \citet{SCB} on minimizing regret in the setting of advice-efficient multiarmed bandits with expert advice.
no code implementations • 22 Apr 2013 • Arpita Ghosh, Satyen Kale, Kevin Lang, Benjamin Moseley
We study trade networks with a tree structure, where a seller with a single indivisible good is connected to buyers, each with some value for the good, via a unique path of intermediaries.
no code implementations • NeurIPS 2011 • Elad Hazan, Satyen Kale
We prove that the regret of \newtron is \(O(\log T)\) when \(\alpha\) is a constant that does not vary with horizon \(T\), and at most \(O(T^{2/3})\) if \(\alpha\) is allowed to increase to infinity with \(T\).
no code implementations • NeurIPS 2010 • Satyen Kale, Lev Reyzin, Robert E. Schapire
We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions.
no code implementations • NeurIPS 2009 • Elad Hazan, Satyen Kale
We consider an online decision problem over a discrete space in which the loss function is submodular.
no code implementations • NeurIPS 2009 • Elad Hazan, Satyen Kale
In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM).
no code implementations • NeurIPS 2007 • Elad Hazan, Satyen Kale
We study the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players.