You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 31 Jul 2023 • Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi

Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data.

no code implementations • 15 Jun 2023 • Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

In Gaussian bandits, we obtain $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$ bounds for an upper confidence bound algorithm, where $c_h$ and $c_\Delta$ are constants depending on the prior distribution and the gaps of random bandit instances sampled from it, respectively.

no code implementations • 5 Jun 2023 • Sunny Sanyal, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi

Training LLMs is expensive, and recent evidence indicates training all the way to convergence is inefficient.

no code implementations • 30 Jan 2023 • Rudrajit Das, Sujay Sanghavi

Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture.

no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 15 Nov 2022 • Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget.

no code implementations • 19 Sep 2022 • Shuo Yang, Sujay Sanghavi, Holakou Rahmanian, Jan Bakus, S. V. N. Vishwanathan

Such features naturally arise in merchandised recommendation systems; for instance, "user clicked this item" as a feature is predictive of "user purchased this item" in the offline data, but is clearly not available during online serving.

no code implementations • 11 Aug 2022 • Nan Jiang, Dhivya Eswaran, Choon Hui Teo, Yexiang Xue, Yesh Dattatreya, Sujay Sanghavi, Vishy Vishwanathan

We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution.

no code implementations • 21 Jun 2022 • Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.

no code implementations • 1 Jun 2022 • Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Michael Rabbat, Inderjit Dhillon

We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative).

no code implementations • 23 May 2022 • Tongzheng Ren, Fuheng Cui, Sujay Sanghavi, Nhat Ho

However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice.

no code implementations • 16 May 2022 • Nhat Ho, Tongzheng Ren, Sujay Sanghavi, Purnamrita Sarkar, Rachel Ward

Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings.

no code implementations • 23 Mar 2022 • Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.

no code implementations • 24 Feb 2022 • Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S. Dhillon, Sujay Sanghavi, Qi Lei

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data.

no code implementations • 9 Feb 2022 • Tongzheng Ren, Jiacheng Zhuo, Sujay Sanghavi, Nhat Ho

This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius.

no code implementations • 15 Oct 2021 • Tongzheng Ren, Fuheng Cui, Alexia Atsidakou, Sujay Sanghavi, Nhat Ho

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions.

no code implementations • 29 Sep 2021 • Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S Dhillon, Sujay Sanghavi, Qi Lei

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data.

2 code implementations • 16 Jun 2021 • Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0. 5.

Ranked #19 on Image Classification on MNIST (Accuracy metric)

no code implementations • 13 Jun 2021 • Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

The primary reason for this is that the clipping operation (i. e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy.

no code implementations • 1 Jun 2021 • Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon

For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels.

no code implementations • NeurIPS 2021 • Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.

no code implementations • 3 Mar 2021 • Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\tilde O(\sqrt{T})$ regret.

no code implementations • 3 Mar 2021 • Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

Specifically, we focus on a challenging setting where 1) the reward distribution of an arm depends on the set $s$ it is part of, and crucially 2) there is \textit{no total order} for the arms in $\mathcal{A}$.

no code implementations • 7 Dec 2020 • Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1. 5})$ to converge to an $\epsilon$-stationary point (i. e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works.

no code implementations • 28 Nov 2020 • Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.

1 code implementation • 20 Nov 2020 • Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.

no code implementations • ICML 2020 • Yanyao Shen, Hsiang-Fu Yu, Sujay Sanghavi, Inderjit Dhillon

Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes.

1 code implementation • 10 Jan 2020 • Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi

The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD).

no code implementations • NeurIPS 2019 • Shuo Yang, Yanyao Shen, Sujay Sanghavi

In this paper, we provide a new algorithm - Interaction Hard Thresholding (IntHT) which is the first one to provably accurately solve this problem in sub-quadratic time and space.

1 code implementation • NeurIPS 2019 • Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi

We give a simple algorithm to estimate the parameters (i. e., the weight matrix and bias vector of the ReLU neural network) up to an error $\epsilon||W||_F$ using $\tilde{O}(1/\epsilon^2)$ samples and $\tilde{O}(d^2/\epsilon^2)$ time (log factors are ignored for simplicity).

no code implementations • NeurIPS 2019 • Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai

We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem.

no code implementations • NeurIPS 2019 • Yanyao Shen, Sujay Sanghavi

We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity.

1 code implementation • 26 Jan 2019 • Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.

no code implementations • 16 Nov 2018 • Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.

no code implementations • 28 Oct 2018 • Yanyao Shen, Sujay Sanghavi

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted.

1 code implementation • NeurIPS 2019 • Shanshan Wu, Sujay Sanghavi, Alexandros G. Dimakis

We show that this algorithm can recover any arbitrary discrete pairwise graphical model, and also characterize its sample complexity as a function of model width, alphabet size, edge parameter accuracy, and the number of variables.

no code implementations • 27 Sep 2018 • Yanyao Shen, Sujay Sanghavi

We study a simple generic framework to address the issue of bad training data; both bad labels in supervised problems, and bad samples in unsupervised ones.

1 code implementation • 26 Jun 2018 • Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.

no code implementations • 8 Mar 2017 • Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi

We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$.

no code implementations • NeurIPS 2016 • Yanyao Shen, Qi-Xing Huang, Nati Srebro, Sujay Sanghavi

The algorithmic advancement of synchronizing maps is important in order to solve a wide range of practice problems with possible large-scale dataset.

1 code implementation • NeurIPS 2016 • Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$.

no code implementations • 4 Oct 2016 • Avik Ray, Joe Neeman, Sujay Sanghavi, Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models.

no code implementations • 12 Sep 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.

no code implementations • 19 Aug 2016 • Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

We give a tractable algorithm for the mixed linear equation problem, and show that under some technical conditions, our algorithm is guaranteed to solve the problem exactly with sample complexity linear in the dimension, and polynomial in $k$, the number of components.

no code implementations • 10 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.

no code implementations • 4 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi

We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective.

no code implementations • 22 Mar 2016 • Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.

no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi

Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.

no code implementations • 14 Sep 2015 • Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

1 code implementation • 16 Jul 2015 • Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, Inderjit S. Dhillon

In this paper we consider the collaborative ranking setting: a pool of users each provides a small number of pairwise preferences between $d$ possible items; from these we need to predict preferences of the users for items they have not yet seen.

no code implementations • 25 Jun 2015 • Chris D. White, Sujay Sanghavi, Rachel Ward

This paper considers the recovery of a rank $r$ positive semidefinite matrix $X X^T\in\mathbb{R}^{n\times n}$ from $m$ scalar measurements of the form $y_i := a_i^T X X^T a_i$ (i. e., quadratic measurements of $X$).

no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham Kakade, Praneeth Netrapalli, Sujay Sanghavi

Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.

no code implementations • 17 Feb 2015 • Srinadh Bhojanapalli, Sujay Sanghavi

In this paper we propose new techniques to sample arbitrary third-order tensors, with an objective of speeding up tensor algorithms that have recently gained popularity in machine learning.

no code implementations • 7 Nov 2014 • Siddhartha Banerjee, Sujay Sanghavi, Sanjay Shakkottai

We consider this problem under a simple natural model, wherein the number of items and the number of item-views are of the same order, and an `access-graph' constrains which user is allowed to see which item.

no code implementations • NeurIPS 2014 • Dohyung Park, Constantine Caramanis, Sujay Sanghavi

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces.

no code implementations • NeurIPS 2014 • Praneeth Netrapalli, U. N. Niranjan, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain

In contrast, existing methods for robust PCA, which are based on convex optimization, have $O(m^2n)$ complexity per iteration, and take $O(1/\epsilon)$ iterations, i. e., exponentially more iterations for the same accuracy.

1 code implementation • 14 Oct 2014 • Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi

The first is a new method to directly compute a low-rank approximation (in efficient factored form) to the product of two given matrices; it computes a small random set of entries of the product, and then executes weighted alternating minimization (as before) on these.

no code implementations • 14 Oct 2013 • Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

Mixed linear regression involves the recovery of two (or more) unknown vectors from unlabeled linear measurements; that is, where each sample comes from exactly one of the vectors, but we do not know which one.

no code implementations • 12 Jun 2013 • Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward

Matrix completion, i. e., the exact and provable recovery of a low-rank matrix from a small subset of its elements, is currently only known to be possible if the matrix satisfies a restrictive structural constraint---known as {\em incoherence}---on its row and column spaces.

1 code implementation • NeurIPS 2013 • Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi

Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on "lifting" to a convex matrix problem) in sample complexity and robustness to noise.

no code implementations • NeurIPS 2012 • Yudong Chen, Sujay Sanghavi, Huan Xu

We develop a new algorithm to cluster sparse unweighted graphs -- i. e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters.

no code implementations • 11 Oct 2012 • Yudong Chen, Sujay Sanghavi, Huan Xu

We show that, in the classic stochastic block model setting, it outperforms existing methods by polynomial factors when the cluster size is allowed to have general scalings.

no code implementations • 25 Apr 2011 • Yudong Chen, Ali Jalali, Sujay Sanghavi, Huan Xu

This paper considers the problem of clustering a partially observed unweighted graph---i. e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge.

no code implementations • 10 Feb 2011 • Yudong Chen, Huan Xu, Constantine Caramanis, Sujay Sanghavi

Moreover, we show by an information-theoretic argument that our guarantees are nearly optimal in terms of the fraction of sampled entries on the authentic columns, the fraction of corrupted columns, and the rank of the underlying matrix.

no code implementations • NeurIPS 2010 • Ali Jalali, Sujay Sanghavi, Chao Ruan, Pradeep K. Ravikumar

However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks.

1 code implementation • NeurIPS 2010 • Huan Xu, Constantine Caramanis, Sujay Sanghavi

Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers.

no code implementations • NeurIPS 2007 • Sujay Sanghavi, Dmitry Malioutov, Alan S. Willsky

Loopy belief propagation has been employed in a wide variety of applications with great empirical success, but it comes with few theoretical guarantees.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.