Search Results for author: Satyen Kale

Found 50 papers, 9 papers with code

Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

no code implementations • NeurIPS 2007 • Elad Hazan, Satyen Kale

We study the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players.

Paper
Add Code

On Stochastic and Worst-case Models for Investing

no code implementations • NeurIPS 2009 • Elad Hazan, Satyen Kale

In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM).

Management valid

Paper
Add Code

Beyond Convexity: Online Submodular Minimization

no code implementations • NeurIPS 2009 • Elad Hazan, Satyen Kale

We consider an online decision problem over a discrete space in which the loss function is submodular.

Paper
Add Code

Non-Stochastic Bandit Slate Problems

no code implementations • NeurIPS 2010 • Satyen Kale, Lev Reyzin, Robert E. Schapire

We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions.

Paper
Add Code

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

no code implementations • NeurIPS 2011 • Elad Hazan, Satyen Kale

We prove that the regret of \newtron is $O(\log T)$ when $\alpha$ is a constant that does not vary with horizon $T$, and at most $O(T^{2/3})$ if $\alpha$ is allowed to increase to infinity with $T$.

Paper
Add Code

Bargaining for Revenue Shares on Tree Trading Networks

no code implementations • 22 Apr 2013 • Arpita Ghosh, Satyen Kale, Kevin Lang, Benjamin Moseley

We study trade networks with a tree structure, where a seller with a single indivisible good is connected to buyers, each with some value for the good, via a unique path of intermediaries.

Paper
Add Code

Multiarmed Bandits With Limited Expert Advice

no code implementations • 19 Jun 2013 • Satyen Kale

We solve the COLT 2013 open problem of \citet{SCB} on minimizing regret in the setting of advice-efficient multiarmed bandits with expert advice.

Paper
Add Code

Adaptive Market Making via Online Learning

no code implementations • NeurIPS 2013 • Jacob Abernethy, Satyen Kale

We consider the design of strategies for \emph{market making} in a market like a stock, commodity, or currency exchange.

Paper
Add Code

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

8,400

Paper
Code

Optimal and Adaptive Algorithms for Online Boosting

no code implementations • 9 Feb 2015 • Alina Beygelzimer, Satyen Kale, Haipeng Luo

We study online boosting, the task of converting any weak online learner into a strong online learner.

Paper
Add Code

Online Gradient Boosting

no code implementations • NeurIPS 2015 • Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo

We extend the theory of boosting for regression problems to the online learning setting.

regression

Paper
Add Code

Hardness of Online Sleeping Combinatorial Optimization Problems

no code implementations • NeurIPS 2016 • Satyen Kale, Chansoo Lee, Dávid Pál

We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round.

Combinatorial Optimization PAC learning

Paper
Add Code

Online Sparse Linear Regression

no code implementations • 7 Mar 2016 • Dean Foster, Satyen Kale, Howard Karloff

We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss.

regression

Paper
Add Code

Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP

no code implementations • ICML 2017 • Satyen Kale, Zohar Karnin, Tengyuan Liang, Dávid Pál

Online sparse linear regression is an online problem where an algorithm repeatedly chooses a subset of coordinates to observe in an adversarially chosen feature vector, makes a real-valued prediction, receives the true label, and incurs the squared loss.

feature selection regression

Paper
Add Code

Parameter-free online learning via model selection

no code implementations • NeurIPS 2017 • Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.

Model Selection

Paper
Add Code

Online Learning of Quantum States

no code implementations • NeurIPS 2018 • Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak

Even in the "non-realizable" setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that do significantly worse than the best possible states at most $\operatorname{O}\!\left(\sqrt {Tn}\right) $ times on the first $T$ measurements.

Paper
Add Code

Logistic Regression: The Importance of Being Improper

no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.

regression

Paper
Add Code

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Paper
Add Code

Stochastic Negative Mining for Learning with Large Output Spaces

no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.

Retrieval

Paper
Add Code

Adaptive Methods for Nonconvex Optimization

1 code implementation • NeurIPS 2018 • Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

Stochastic Optimization

Paper
Code

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Paper
Add Code

Hypothesis Set Stability and Generalization

no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.

Paper
Add Code

On the Convergence of Adam and Beyond

3 code implementations • ICLR 2018 • Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.

Stochastic Optimization

47,992

Paper
Code

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations • ICML 2020 • Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

1,147

Paper
Code

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

no code implementations • NeurIPS 2019 • Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.

Attribute Classification +2

Paper
Add Code

A Deep Conditioning Treatment of Neural Networks

no code implementations • 4 Feb 2020 • Naman Agarwal, Pranjal Awasthi, Satyen Kale

We study the role of depth in training randomly initialized overparameterized neural networks.

Paper
Add Code

Estimating Training Data Influence by Tracing Gradient Descent

3 code implementations • NeurIPS 2020 • Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model.

4,568

Paper
Code

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Code

PAC-Bayes Learning Bounds for Sample-Dependent Priors

no code implementations • NeurIPS 2020 • Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri

We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors.

Paper
Add Code

Learning with User-Level Privacy

no code implementations • NeurIPS 2021 • Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.

Paper
Add Code

A Multiclass Boosting Framework for Achieving Fast and Provable Adversarial Robustness

no code implementations • 1 Mar 2021 • Jacob Abernethy, Pranjal Awasthi, Satyen Kale

This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities.

Adversarial Robustness Object Recognition

Paper
Add Code

Federated Functional Gradient Boosting

no code implementations • 11 Mar 2021 • Zebang Shen, Hamed Hassani, Satyen Kale, Amin Karbasi

First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated functional gradient boosting (FFGB) method that provably converges to the global minimum.

Federated Learning

Paper
Add Code

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

no code implementations • NeurIPS 2021 • Satyen Kale, Ayush Sekhari, Karthik Sridharan

We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$.

Paper
Add Code

A Field Guide to Federated Optimization

2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.

Federated Learning

647

Paper
Code

Efficient Methods for Online Multiclass Logistic Regression

no code implementations • 6 Oct 2021 • Naman Agarwal, Satyen Kale, Julian Zimmert

Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.

regression

Paper
Add Code

Breaking the centralized barrier for cross-device federated learning

no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Add Code

Agnostic Learnability of Halfspaces via Logistic Loss

no code implementations • 31 Jan 2022 • Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).

regression

Paper
Add Code

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States

no code implementations • 6 Feb 2022 • Julian Zimmert, Naman Agarwal, Satyen Kale

This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.

Paper
Add Code

Reproducibility in Optimization: Theoretical Framework and Limits

no code implementations • 9 Feb 2022 • Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

We initiate a formal study of reproducibility in optimization.

Paper
Add Code

Mixed Federated Learning: Joint Decentralized and Centralized Learning

no code implementations • 26 May 2022 • Sean Augenstein, Andrew Hard, Lin Ning, Karan Singhal, Satyen Kale, Kurt Partridge, Rajiv Mathews

For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution.

Federated Learning

Paper
Add Code

Self-Consistency of the Fokker-Planck Equation

1 code implementation • 2 Jun 2022 • Zebang Shen, Zhenfu Wang, Satyen Kale, Alejandro Ribeiro, Amin Karbasi, Hamed Hassani

In this paper, we exploit this concept to design a potential function of the hypothesis velocity fields, and prove that, if such a function diminishes to zero during the training procedure, the trajectory of the densities generated by the hypothesis velocity fields converges to the solution of the FPE in the Wasserstein-2 sense.

Paper
Code

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

no code implementations • 9 Jun 2022 • Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang

Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg.

Federated Learning

Paper
Add Code

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

no code implementations • 21 Jun 2022 • Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.

Benchmarking regression

Paper
Add Code

Private Matrix Approximation and Geometry of Unitary Orbits

no code implementations • 6 Jul 2022 • Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi

Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$.

Paper
Add Code

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

no code implementations • 13 Oct 2022 • Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart).

Retrieval

Paper
Add Code

On the Convergence of Federated Averaging with Cyclic Client Participation

no code implementations • 6 Feb 2023 • Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang

Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL).

Federated Learning

Paper
Add Code

Improved Differentially Private and Lazy Online Convex Optimization

no code implementations • 15 Dec 2023 • Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta

We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO).

Paper
Add Code

Asynchronous Local-SGD Training for Language Modeling

1 code implementation • 17 Jan 2024 • Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.

Distributed Optimization Language Modelling

Paper
Code

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations • 8 Feb 2024 • Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.

Paper
Add Code

Stacking as Accelerated Gradient Descent

no code implementations • 8 Mar 2024 • Naman Agarwal, Pranjal Awasthi, Satyen Kale, Eric Zhao

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.