no code implementations • 14 Feb 2024 • Arun Suggala, Y. Jennifer Sun, Praneeth Netrapalli, Elad Hazan
We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex.
no code implementations • 14 Feb 2024 • Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache.
no code implementations • 13 Feb 2024 • Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3. 3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1. 16x speedup compared to a PaLM2-Otter model with comparable downstream performance.
no code implementations • 30 Nov 2023 • Lénaïc Chizat, Praneeth Netrapalli
In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$).
no code implementations • 25 Jun 2023 • Dheeraj Baby, Aniket Das, Dheeraj Nagaraj, Praneeth Netrapalli
Our work shows that we can estimate $\mathbf{w}^{*}$ in squared norm up to an error of $\tilde{O}\left(\|\mathbf{f}^{*}\|^2 \cdot \left(\frac{1}{n} + \left(\frac{d}{n}\right)^2\right)\right)$ and prove a matching lower bound (upto log factors).
1 code implementation • 18 Oct 2022 • Harikrishna Narasimhan, Harish G. Ramaswamy, Shiv Kumar Tavker, Drona Khurana, Praneeth Netrapalli, Shivani Agarwal
We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix.
no code implementations • 11 Oct 2022 • Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli
In this work, we consider the problem of collaborative multi-user reinforcement learning.
no code implementations • 4 Oct 2022 • Sravanti Addepalli, Anshul Nasery, R. Venkatesh Babu, Praneeth Netrapalli, Prateek Jain
To bridge the gap between these two lines of work, we first hypothesize and verify that while SB may not altogether preclude learning complex features, it amplifies simpler features over complex ones.
no code implementations • 29 Sep 2022 • Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.
no code implementations • 19 Aug 2022 • Anshul Nasery, Sravanti Addepalli, Praneeth Netrapalli, Prateek Jain
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution.
no code implementations • 17 Jun 2022 • Ashwin Vaswani, Gaurav Aggarwal, Praneeth Netrapalli, Narayan G Hegde
Compared to standard multilabel baselines, CHAMP provides improved AUPRC in both robustness (8. 87% mean percentage improvement ) and less data regimes.
1 code implementation • 17 Jun 2022 • Kushal Majmundar, Sachin Goyal, Praneeth Netrapalli, Prateek Jain
Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data.
no code implementations • 9 Feb 2022 • Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir
We initiate a formal study of reproducibility in optimization.
no code implementations • NeurIPS 2021 • Ankit Garg, Robin Kothari, Praneeth Netrapalli, Suhail Sherif
We study the complexity of optimizing highly smooth convex functions.
no code implementations • NeurIPS 2021 • Kiran K. Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
To cope with such data scarcity, meta-representation learning methods train across many related tasks to find a shared (lower-dimensional) representation of the data where all tasks can be solved accurately.
no code implementations • ICLR 2022 • Naman Agarwal, Syomantak Chaudhuri, Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli
The starting point of our work is the observation that in practice, Q-learning is used with two important modifications: (i) training with two networks, called online network and target network simultaneously (online target learning, or OTL) , and (ii) experience replay (ER) (Mnih et al., 2015).
1 code implementation • ICLR 2022 • Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi
We consider the problem of training a classification model with group annotated training data.
1 code implementation • ICLR 2022 • Tanner Fiez, Chi Jin, Praneeth Netrapalli, Lillian J. Ratliff
This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$.
no code implementations • NeurIPS 2021 • Prateek Jain, Suhas S Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli
In this work, we improve existing results for learning nonlinear systems in a number of ways: a) we provide the first offline algorithm that can learn non-linear dynamical systems without the mixing assumption, b) we significantly improve upon the sample complexity of existing results for mixing systems, c) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay ($\mathsf{SGD-RER}$) method, and demonstrate that for mixing systems, it achieves the same sample complexity as our offline algorithm, d) we justify the expansivity assumption by showing that for the popular ReLU link function -- a non-expansive but easy to learn link function with i. i. d.
no code implementations • 18 May 2021 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
We show that, for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $\Omega(\log d)$ samples per task.
no code implementations • NeurIPS 2021 • Prateek Jain, Suhas S Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli
Thus, we provide the first -- to the best of our knowledge -- optimal SGD-style algorithm for the classical problem of linear system identification with a first order oracle.
1 code implementation • NeurIPS 2021 • Harshay Shah, Prateek Jain, Praneeth Netrapalli
We believe that the DiffROAR evaluation framework and BlockMNIST-based datasets can serve as sanity checks to audit instance-specific interpretability methods; code and data available at https://github. com/harshays/inputgradients.
no code implementations • 15 Feb 2021 • Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain
We study online learning with bandit feedback (i. e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i. e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is one-dimensional.
no code implementations • NeurIPS 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
In this work, we present MOReL, an algorithmic framework for model-based offline RL.
no code implementations • NeurIPS 2020 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible $\epsilon$-suboptimal solution using $O(\epsilon^{-2})$ LMO calls and FO calls---both match known lower bounds, resolving a question left open since White (1993).
no code implementations • 19 Jun 2020 • Kartik Gupta, Arun Sai Suggala, Adarsh Prasad, Praneeth Netrapalli, Pradeep Ravikumar
We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game.
no code implementations • NeurIPS 2020 • Guy Bresler, Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli, Xian Wu
Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.
2 code implementations • NeurIPS 2020 • Harshay Shah, Kaustav Tamuly, aditi raghunathan, Prateek Jain, Praneeth Netrapalli
Furthermore, previous settings that use SB to theoretically justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks---a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017].
no code implementations • NeurIPS 2020 • Arun Sai Suggala, Praneeth Netrapalli
For Lipschitz and smooth nonconvex-nonconcave games, our algorithm requires access to an optimization oracle which computes the perturbed best response.
1 code implementation • 18 May 2020 • Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar
One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all the topical structure is unlikely to yield an effective document representation.
2 code implementations • 12 May 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
In this work, we present MOReL, an algorithmic framework for model-based offline RL.
2 code implementations • ICML 2020 • Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi
The domain specific components are discarded after training and only the common component is retained.
Ranked #1 on
Domain Generalization
on LipitK
no code implementations • 21 Oct 2019 • Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli
What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training?
2 code implementations • NeurIPS 2019 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x, y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$.
no code implementations • 17 May 2019 • Raghav Somani, Navin Goyal, Prateek Jain, Praneeth Netrapalli
This paper proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, 0/1 error etc.)
no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli
One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.
1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli
First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).
no code implementations • 29 Apr 2019 • Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli
While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages} of iterates and obtains information theoretically optimal bounds on suboptimality, the \emph{last point} of SGD is, by far, the most preferred choice in practice.
no code implementations • 19 Mar 2019 • Arun Sai Suggala, Praneeth Netrapalli
We show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves optimal regret rate of $O(T^{-1/2})$ in this setting.
no code implementations • 4 Mar 2019 • Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli
For {\em small} $K$, we show \sgdwor can achieve same convergence rate as \sgd for {\em general smooth strongly-convex} functions.
no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.
no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.
1 code implementation • ICML 2020 • Chi Jin, Praneeth Netrapalli, Michael. I. Jordan
Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks (GANs), adversarial training and multi-agent reinforcement learning.
BIG-bench Machine Learning
Multi-agent Reinforcement Learning
+1
no code implementations • NeurIPS 2018 • Raghav Somani, Chirag Gupta, Prateek Jain, Praneeth Netrapalli
This paper studies the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function.
no code implementations • 27 Sep 2018 • Vivek Gupta, Ankit Kumar Saw, Partha Pratim Talukdar, Praneeth Netrapalli
One reason for this degradation is due to the fact that a longer document is likely to contain words from many different themes (or topics), and hence creating a single vector while ignoring all the thematic structure is unlikely to yield an effective representation of the document.
2 code implementations • ICLR 2018 • Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
Extensive empirical results in this paper show that ASGD has performance gains over HB, NAG, and SGD.
no code implementations • 1 Mar 2018 • Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli
Semidefinite programs (SDP) are important in learning and combinatorial optimization with numerous applications.
no code implementations • ICLR 2018 • Rahul Anand Sharma, Navin Goyal, Monojit Choudhury, Praneeth Netrapalli
This paper explores the simplicity of learned neural networks under various settings: learned on real vs random data, varying size/architecture and using large minibatch size vs small minibatch size.
no code implementations • 28 Nov 2017 • Chi Jin, Praneeth Netrapalli, Michael. I. Jordan
Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting.
no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford
Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.
no code implementations • 25 Oct 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford
This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares.
no code implementations • 26 Apr 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e. g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014.
no code implementations • 13 Apr 2017 • Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff
We thus effectively compute a histogram of the spectrum, which can stand in for the true singular values in many applications.
no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").
no code implementations • 18 Feb 2017 • Yeshwanth Cherapanamjeri, Prateek Jain, Praneeth Netrapalli
That is, given a data matrix $M^*$, where $(1-\alpha)$ fraction of the points are noisy samples from a low-dimensional subspace while $\alpha$ fraction of the points can be arbitrary outliers, the goal is to recover the subspace accurately.
4 code implementations • 12 Oct 2016 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD's final iterate.
no code implementations • 26 May 2016 • Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford
We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $\Sigma$ -- i. e. computing a unit vector $x$ such that $x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $\Sigma = A^TA$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon )$.
no code implementations • NeurIPS 2016 • Chi Jin, Sham M. Kakade, Praneeth Netrapalli
While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting.
no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.
no code implementations • 22 Feb 2016 • Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
This work provides improved guarantees for streaming principle component analysis (PCA).
no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi
Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.
no code implementations • 29 Oct 2015 • Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford
Combining our algorithm with previous work to initialize $x_0$, we obtain a number of improved sample complexity and runtime results.
no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham Kakade, Praneeth Netrapalli, Sujay Sanghavi
Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.
no code implementations • 3 Feb 2015 • Jason K. Johnson, Diane Oyen, Michael Chertkov, Praneeth Netrapalli
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering.
no code implementations • 4 Nov 2014 • Prateek Jain, Praneeth Netrapalli
In this paper, we present a fast iterative algorithm that solves the matrix completion problem by observing $O(nr^5 \log^3 n)$ entries, which is independent of the condition number and the desired accuracy.
no code implementations • NeurIPS 2014 • Praneeth Netrapalli, U. N. Niranjan, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain
In contrast, existing methods for robust PCA, which are based on convex optimization, have $O(m^2n)$ complexity per iteration, and take $O(1/\epsilon)$ iterations, i. e., exponentially more iterations for the same accuracy.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli
Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed.
no code implementations • 8 Sep 2013 • Alekh Agarwal, Animashree Anandkumar, Praneeth Netrapalli
We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements.
1 code implementation • NeurIPS 2013 • Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on "lifting" to a convex matrix problem) in sample complexity and robustness to noise.