Search Results for author: Sham M. Kakade

Found 82 papers, 10 papers with code

Recovering Structured Probability Matrices

no code implementations • 21 Feb 2016 • Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant

When can accurate reconstruction be accomplished in the sparse data regime?

Collaborative Filtering Community Detection +3

Paper
Add Code

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

no code implementations • ICML 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

Paper
Add Code

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

no code implementations • 25 Oct 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares.

Paper
Add Code

Accelerating Stochastic Gradient Descent For Least Squares Regression

no code implementations • 26 Apr 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e. g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014.

regression Stochastic Optimization

Paper
Add Code

How to Escape Saddle Points Efficiently

no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

Paper
Add Code

Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes

no code implementations • 1 Dec 2016 • Corinne L. Jones, Sham M. Kakade, Lucas W. Thornblade, David R. Flum, Abraham D. Flaxman

We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes.

Paper
Add Code

Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

no code implementations • 29 Oct 2015 • Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

Combining our algorithm with previous work to initialize $x_0$, we obtain a number of improved sample complexity and runtime results.

Stochastic Optimization

Paper
Add Code

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.

Paper
Add Code

Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

no code implementations • NeurIPS 2016 • Chi Jin, Sham M. Kakade, Praneeth Netrapalli

While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting.

Matrix Completion

Paper
Add Code

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

no code implementations • 26 May 2016 • Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $\Sigma$ -- i. e. computing a unit vector $x$ such that $x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $\Sigma = A^TA$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon )$.

Stochastic Optimization

Paper
Add Code

Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

no code implementations • 22 Feb 2016 • Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

This work provides improved guarantees for streaming principle component analysis (PCA).

Paper
Add Code

Super-Resolution Off the Grid

no code implementations • NeurIPS 2015 • Qingqing Huang, Sham M. Kakade

- The number of measurements taken by and the computational complexity of our algorithm are bounded by a polynomial in both the number of points k and the dimension d, with no dependence on the separation \Delta.

Astronomy Super-Resolution

Paper
Add Code

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

no code implementations • 24 Jun 2015 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

We develop a family of accelerated stochastic algorithms that minimize sums of convex functions.

Paper
Add Code

Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity

no code implementations • 31 Oct 2009 • Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari

The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model.

Vocal Bursts Intensity Prediction

Paper
Add Code

Learning Mixtures of Gaussians in High Dimensions

no code implementations • 2 Mar 2015 • Rong Ge, Qingqing Huang, Sham M. Kakade

Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.

Learning Theory Vocal Bursts Intensity Prediction

Paper
Add Code

Competing with the Empirical Risk Minimizer in a Single Pass

no code implementations • 20 Dec 2014 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

Paper
Add Code

Tensor decompositions for learning latent variable models

no code implementations • 29 Oct 2012 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

Paper
Add Code

Random design analysis of ridge regression

no code implementations • 13 Jun 2011 • Daniel Hsu, Sham M. Kakade, Tong Zhang

The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting.

LEMMA regression

Paper
Add Code

A Tensor Approach to Learning Mixed Membership Community Models

no code implementations • 12 Feb 2013 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

Community Detection Stochastic Block Model

Paper
Add Code

Least Squares Revisited: Scalable Approaches for Multi-class Prediction

no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.

Paper
Add Code

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

no code implementations • 4 May 2011 • Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar

We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a Principal Component Analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace.

regression

Paper
Add Code

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

no code implementations • 24 Sep 2012 • Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham M. Kakade

The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks.

Topic Models

Paper
Add Code

Coupled Recurrent Models for Polyphonic Music Composition

no code implementations • 20 Nov 2018 • John Thickstun, Zaid Harchaoui, Dean P. Foster, Sham M. Kakade

This paper introduces a novel recurrent model for music composition that is tailored to the structure of polyphonic music.

Time Series Analysis

Paper
Add Code

Provably Correct Automatic Sub-Differentiation for Qualified Programs

no code implementations • NeurIPS 2018 • Sham M. Kakade, Jason D. Lee

The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures.

Paper
Add Code

Convergence rates of sub-sampled Newton methods

no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi

Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.

Active Learning Binary Classification +3

Paper
Add Code

A Spectral Algorithm for Latent Dirichlet Allocation

no code implementations • NeurIPS 2012 • Anima Anandkumar, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Yi-Kai Liu

This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA).

Clustering Topic Models

Paper
Add Code

Learning Mixtures of Tree Graphical Models

no code implementations • NeurIPS 2012 • Anima Anandkumar, Daniel J. Hsu, Furong Huang, Sham M. Kakade

We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables.

Paper
Add Code

Identifiability and Unmixing of Latent Parse Trees

no code implementations • NeurIPS 2012 • Daniel J. Hsu, Sham M. Kakade, Percy S. Liang

This paper explores unsupervised learning of parsing models along two directions.

Dependency Parsing

Paper
Add Code

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

no code implementations • NeurIPS 2011 • Sham M. Kakade, Varun Kanade, Ohad Shamir, Adam Kalai

In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient.

regression

Paper
Add Code

Stochastic convex optimization with bandit feedback

no code implementations • NeurIPS 2011 • Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model.

Paper
Add Code

Spectral Methods for Learning Multivariate Latent Tree Structure

no code implementations • NeurIPS 2011 • Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i. e., the graph of how the underlying hidden variables are connected to each other and to the observed variables).

Paper
Add Code

On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

no code implementations • NeurIPS 2008 • Sham M. Kakade, Karthik Sridharan, Ambuj Tewari

We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes.

Paper
Add Code

On the Generalization Ability of Online Strongly Convex Programming Algorithms

no code implementations • NeurIPS 2008 • Sham M. Kakade, Ambuj Tewari

This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex.

Paper
Add Code

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

no code implementations • NeurIPS 2008 • Shai Shalev-Shwartz, Sham M. Kakade

We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms.

Paper
Add Code

Rethinking learning rate schedules for stochastic optimization

no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

Stochastic Optimization

Paper
Add Code

Global Convergence of Policy Gradient Methods for Linearized Control Problems

no code implementations • ICLR 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

Paper
Add Code

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

Paper
Add Code

Maximum Likelihood Estimation for Learning Populations of Parameters

no code implementations • 12 Feb 2019 • Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade

Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover's distance (between the estimated and true distributions).

Paper
Add Code

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

BIG-bench Machine Learning

Paper
Add Code

Online Control with Adversarial Disturbances

no code implementations • 23 Feb 2019 • Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, Karan Singh

We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise).

Paper
Add Code

Calibration, Entropy Rates, and Memory in Language Models

no code implementations • ICML 2020 • Mark Braverman, Xinyi Chen, Sham M. Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing.

Paper
Add Code

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

no code implementations • 1 Aug 2019 • Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.

Policy Gradient Methods

Paper
Add Code

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

no code implementations • ICLR 2020 • Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.

Imitation Learning reinforcement-learning +1

Paper
Add Code

The Nonstochastic Control Problem

no code implementations • 27 Nov 2019 • Elad Hazan, Sham M. Kakade, Karan Singh

We consider the problem of controlling an unknown linear dynamical system in the presence of (nonstochastic) adversarial perturbations and adversarial convex loss functions.

Paper
Add Code

Optimal Estimation of Change in a Population of Parameters

no code implementations • 28 Nov 2019 • Ramya Korlakai Vinayak, Weihao Kong, Sham M. Kakade

Provided these paired observations, $\{(X_i, Y_i) \}_{i=1}^N$, our goal is to accurately estimate the \emph{distribution of the change in parameters}, $\delta_i := q_i - p_i$, over the population and properties of interest like the \emph{$\ell_1$-magnitude of the change} with sparse observations ($t\ll N$).

Epidemiology

Paper
Add Code

Few-Shot Learning via Learning the Representation, Provably

no code implementations • ICLR 2021 • Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei

First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} + \frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class, $\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the dimension of the representation.

Few-Shot Learning Representation Learning

Paper
Add Code

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

no code implementations • 1 May 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

no code implementations • NeurIPS 2020 • Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu

Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

no code implementations • NeurIPS 2020 • Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang

This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

What are the Statistical Limits of Batch RL with Linear Function Approximation?

no code implementations • ICLR 2021 • Ruosong Wang, Dean Foster, Sham M. Kakade

Function approximation methods coupled with batch reinforcement learning (or off-policy reinforcement learning) are providing an increasingly important framework to help alleviate the excessive sample complexity burden in modern reinforcement learning problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

What are the Statistical Limits of Offline RL with Linear Function Approximation?

no code implementations • 22 Oct 2020 • Ruosong Wang, Dean P. Foster, Sham M. Kakade

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies.

Decision Making Offline RL +2

Paper
Add Code

A Spectral Algorithm for Learning Hidden Markov Models

no code implementations • 26 Nov 2008 • Daniel Hsu, Sham M. Kakade, Tong Zhang

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series.

Time Series Time Series Analysis

Paper
Add Code

Instabilities of Offline RL with Pre-Trained Neural Representation

no code implementations • 8 Mar 2021 • Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Paper
Add Code

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

no code implementations • 23 Mar 2021 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

More specifically, for SGD with iterate averaging, we demonstrate the sharpness of the established excess risk bound by proving a matching lower bound (up to constant factors).

regression

Paper
Add Code

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade

This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal $Q$-function.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Method of Moments for Mixture Models and Hidden Markov Models

1 code implementation • 3 Mar 2012 • Animashree Anandkumar, Daniel Hsu, Sham M. Kakade

Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations.

Paper
Code

A Short Note on the Relationship of Information Gain and Eluder Dimension

no code implementations • 6 Jul 2021 • Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei

Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning.

LEMMA reinforcement-learning +1

Paper
Add Code

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem.

Paper
Add Code

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions.

Reinforcement Learning (RL)

Paper
Add Code

The Benefits of Implicit Regularization from SGD in Least Squares Problems

no code implementations • NeurIPS 2021 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches.

regression

Paper
Add Code

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

no code implementations • 12 Oct 2021 • Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

In this paper, we provide a problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems.

regression

Paper
Add Code

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade

The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providing an exponential (in $d$) sample size lower bound, which holds even if the agent has access to a generative model of the environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Statistical Complexity of Interactive Decision Making

no code implementations • 27 Dec 2021 • Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.

Decision Making reinforcement-learning +1

Paper
Add Code

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

no code implementations • 7 Mar 2022 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization.

Paper
Add Code

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

no code implementations • 3 Aug 2022 • Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data.

regression Transfer Learning

Paper
Add Code

Deep Inventory Management

no code implementations • 6 Oct 2022 • Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade

This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching.

Management Model-based Reinforcement Learning +2

Paper
Add Code

The Role of Coverage in Online Reinforcement Learning

no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.

Efficient Exploration Offline RL +2

Paper
Add Code

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

no code implementations • 18 Oct 2022 • Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham M. Kakade, Sergey Levine

Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs to specify what the task is, in reality practitioners often need to design more detailed rewards that provide the agent with some hints about how the task should be completed.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Hidden Markov Models Using Conditional Samples

no code implementations • 28 Feb 2023 • Sham M. Kakade, Akshay Krishnamurthy, Gaurav Mahajan, Cyril Zhang

In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.

Time Series Time Series Analysis

Paper
Add Code

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

no code implementations • 3 Mar 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation.

regression Vocal Bursts Intensity Prediction

Paper
Add Code

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

no code implementations • 22 Mar 2023 • Dylan J. Foster, Noah Golowich, Sham M. Kakade

They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.

Computational Efficiency Multi-agent Reinforcement Learning

Paper
Add Code

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent

no code implementations • 18 Apr 2024 • Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

We then demonstrate how a trained neural network with SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.

Paper
Add Code

A Smoother Way to Train Structured Prediction Models

1 code implementation • NeurIPS 2018 • Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui

We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon.

named-entity-recognition Named Entity Recognition +3

Paper
Code

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

2 code implementations • 21 Dec 2009 • Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias Seeger

Many applications require optimizing an unknown, noisy function that is expensive to evaluate.

Experimental Design

Paper
Code

Provably Efficient Maximum Entropy Exploration

2 code implementations • 6 Dec 2018 • Elad Hazan, Sham M. Kakade, Karan Singh, Abby Van Soest

Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do?

Paper
Code

Invariances and Data Augmentation for Supervised Music Transcription

1 code implementation • 13 Nov 2017 • John Thickstun, Zaid Harchaoui, Dean Foster, Sham M. Kakade

This paper explores a variety of models for frame-based music transcription, with an emphasis on the methods needed to reach state-of-the-art on human recordings.

Data Augmentation Music Transcription +1

Paper
Code

Robust Aggregation for Federated Learning

2 code implementations • arXiv preprint 2019 • Krishna Pillutla, Sham M. Kakade, Zaid Harchaoui

We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server.

Additive models Federated Learning +1

Paper
Code

Repeat After Me: Transformers are Better than State Space Models at Copying

1 code implementation • 1 Feb 2024 • Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.

168

Paper
Code

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

1 code implementation • 12 Oct 2016 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD's final iterate.

regression

210

Paper
Code

On the insufficiency of existing momentum schemes for Stochastic Optimization

2 code implementations • ICLR 2018 • Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade

Extensive empirical results in this paper show that ASGD has performance gains over HB, NAG, and SGD.

Stochastic Optimization

210

Paper
Code

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

Stochastic Optimization

619

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.