Search Results for author: Satyen Kale

Found 50 papers, 9 papers with code

Stacking as Accelerated Gradient Descent

no code implementations8 Mar 2024 Naman Agarwal, Pranjal Awasthi, Satyen Kale, Eric Zhao

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks.

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations8 Feb 2024 Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork (e. g. depth-wise, width-wise) of the network at each step, progressively increasing the size in stages.

Inductive Bias

Asynchronous Local-SGD Training for Language Modeling

1 code implementation17 Jan 2024 Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.

Distributed Optimization Language Modelling

Improved Differentially Private and Lazy Online Convex Optimization

no code implementations15 Dec 2023 Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta

We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO).

On the Convergence of Federated Averaging with Cyclic Client Participation

no code implementations6 Feb 2023 Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang

Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL).

Federated Learning

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

no code implementations13 Oct 2022 Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart).

Retrieval

Private Matrix Approximation and Geometry of Unitary Orbits

no code implementations6 Jul 2022 Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi

Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$.

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

no code implementations21 Jun 2022 Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.

Benchmarking regression

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

no code implementations9 Jun 2022 Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang

Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg.

Federated Learning

Self-Consistency of the Fokker-Planck Equation

1 code implementation2 Jun 2022 Zebang Shen, Zhenfu Wang, Satyen Kale, Alejandro Ribeiro, Amin Karbasi, Hamed Hassani

In this paper, we exploit this concept to design a potential function of the hypothesis velocity fields, and prove that, if such a function diminishes to zero during the training procedure, the trajectory of the densities generated by the hypothesis velocity fields converges to the solution of the FPE in the Wasserstein-2 sense.

Mixed Federated Learning: Joint Decentralized and Centralized Learning

no code implementations26 May 2022 Sean Augenstein, Andrew Hard, Lin Ning, Karan Singhal, Satyen Kale, Kurt Partridge, Rajiv Mathews

For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution.

Federated Learning

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States

no code implementations6 Feb 2022 Julian Zimmert, Naman Agarwal, Satyen Kale

This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.

Agnostic Learnability of Halfspaces via Logistic Loss

no code implementations31 Jan 2022 Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).

regression

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Efficient Methods for Online Multiclass Logistic Regression

no code implementations6 Oct 2021 Naman Agarwal, Satyen Kale, Julian Zimmert

Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.

regression

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

no code implementations NeurIPS 2021 Satyen Kale, Ayush Sekhari, Karthik Sridharan

We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$.

Federated Functional Gradient Boosting

no code implementations11 Mar 2021 Zebang Shen, Hamed Hassani, Satyen Kale, Amin Karbasi

First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated functional gradient boosting (FFGB) method that provably converges to the global minimum.

Federated Learning

A Multiclass Boosting Framework for Achieving Fast and Provable Adversarial Robustness

no code implementations1 Mar 2021 Jacob Abernethy, Pranjal Awasthi, Satyen Kale

This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities.

Adversarial Robustness Object Recognition

Learning with User-Level Privacy

no code implementations NeurIPS 2021 Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.

PAC-Bayes Learning Bounds for Sample-Dependent Priors

no code implementations NeurIPS 2020 Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri

We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors.

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Estimating Training Data Influence by Tracing Gradient Descent

4 code implementations NeurIPS 2020 Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model.

A Deep Conditioning Treatment of Neural Networks

no code implementations4 Feb 2020 Naman Agarwal, Pranjal Awasthi, Satyen Kale

We study the role of depth in training randomly initialized overparameterized neural networks.

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

no code implementations NeurIPS 2019 Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.

Attribute Classification +2

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations ICML 2020 Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

On the Convergence of Adam and Beyond

3 code implementations ICLR 2018 Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.

Stochastic Optimization

Hypothesis Set Stability and Generalization

no code implementations NeurIPS 2019 Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations26 Jan 2019 Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Adaptive Methods for Nonconvex Optimization

1 code implementation NeurIPS 2018 Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

Stochastic Optimization

Stochastic Negative Mining for Learning with Large Output Spaces

no code implementations16 Oct 2018 Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.

Retrieval

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations ICML 2018 Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Logistic Regression: The Importance of Being Improper

no code implementations25 Mar 2018 Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.

regression

Online Learning of Quantum States

no code implementations NeurIPS 2018 Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak

Even in the "non-realizable" setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that do significantly worse than the best possible states at most $\operatorname{O}\!\left(\sqrt {Tn}\right) $ times on the first $T$ measurements.

Parameter-free online learning via model selection

no code implementations NeurIPS 2017 Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.

Model Selection

Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP

no code implementations ICML 2017 Satyen Kale, Zohar Karnin, Tengyuan Liang, Dávid Pál

Online sparse linear regression is an online problem where an algorithm repeatedly chooses a subset of coordinates to observe in an adversarially chosen feature vector, makes a real-valued prediction, receives the true label, and incurs the squared loss.

feature selection regression

Online Sparse Linear Regression

no code implementations7 Mar 2016 Dean Foster, Satyen Kale, Howard Karloff

We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss.

regression

Hardness of Online Sleeping Combinatorial Optimization Problems

no code implementations NeurIPS 2016 Satyen Kale, Chansoo Lee, Dávid Pál

We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round.

Combinatorial Optimization PAC learning

Online Gradient Boosting

no code implementations NeurIPS 2015 Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo

We extend the theory of boosting for regression problems to the online learning setting.

regression

Optimal and Adaptive Algorithms for Online Boosting

no code implementations9 Feb 2015 Alina Beygelzimer, Satyen Kale, Haipeng Luo

We study online boosting, the task of converting any weak online learner into a strong online learner.

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation4 Feb 2014 Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

Adaptive Market Making via Online Learning

no code implementations NeurIPS 2013 Jacob Abernethy, Satyen Kale

We consider the design of strategies for \emph{market making} in a market like a stock, commodity, or currency exchange.

Multiarmed Bandits With Limited Expert Advice

no code implementations19 Jun 2013 Satyen Kale

We solve the COLT 2013 open problem of \citet{SCB} on minimizing regret in the setting of advice-efficient multiarmed bandits with expert advice.

Bargaining for Revenue Shares on Tree Trading Networks

no code implementations22 Apr 2013 Arpita Ghosh, Satyen Kale, Kevin Lang, Benjamin Moseley

We study trade networks with a tree structure, where a seller with a single indivisible good is connected to buyers, each with some value for the good, via a unique path of intermediaries.

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

no code implementations NeurIPS 2011 Elad Hazan, Satyen Kale

We prove that the regret of \newtron is \(O(\log T)\) when \(\alpha\) is a constant that does not vary with horizon \(T\), and at most \(O(T^{2/3})\) if \(\alpha\) is allowed to increase to infinity with \(T\).

Non-Stochastic Bandit Slate Problems

no code implementations NeurIPS 2010 Satyen Kale, Lev Reyzin, Robert E. Schapire

We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions.

Beyond Convexity: Online Submodular Minimization

no code implementations NeurIPS 2009 Elad Hazan, Satyen Kale

We consider an online decision problem over a discrete space in which the loss function is submodular.

On Stochastic and Worst-case Models for Investing

no code implementations NeurIPS 2009 Elad Hazan, Satyen Kale

In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM).

Management valid

Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

no code implementations NeurIPS 2007 Elad Hazan, Satyen Kale

We study the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players.

Cannot find the paper you are looking for? You can Submit a new open access paper.