Search Results for author: Sujay Sanghavi

Found 67 papers, 12 papers with code

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

no code implementations31 Jul 2023 Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi

Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data.


Finite-Time Logarithmic Bayes Regret Upper Bounds

no code implementations15 Jun 2023 Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

In Gaussian bandits, we obtain $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$ bounds for an upper confidence bound algorithm, where $c_h$ and $c_\Delta$ are constants depending on the prior distribution and the gaps of random bandit instances sampled from it, respectively.

Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models

no code implementations5 Jun 2023 Sunny Sanyal, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi

Training LLMs is expensive, and recent evidence indicates training all the way to convergence is inefficient.

Understanding Self-Distillation in the Presence of Label Noise

no code implementations30 Jan 2023 Rudrajit Das, Sujay Sanghavi

Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture.

Binary Classification regression

Bayesian Fixed-Budget Best-Arm Identification

no code implementations15 Nov 2022 Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget.

Toward Understanding Privileged Features Distillation in Learning-to-Rank

no code implementations19 Sep 2022 Shuo Yang, Sujay Sanghavi, Holakou Rahmanian, Jan Bakus, S. V. N. Vishwanathan

Such features naturally arise in merchandised recommendation systems; for instance, "user clicked this item" as a feature is predictive of "user purchased this item" in the offline data, but is clearly not available during online serving.

Learning-To-Rank Recommendation Systems

On the Value of Behavioral Representations for Dense Retrieval

no code implementations11 Aug 2022 Nan Jiang, Dhivya Eswaran, Choon Hui Teo, Yexiang Xue, Yesh Dattatreya, Sujay Sanghavi, Vishy Vishwanathan

We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution.

Retrieval Text Retrieval

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

no code implementations21 Jun 2022 Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.

Benchmarking regression

Positive Unlabeled Contrastive Learning

no code implementations1 Jun 2022 Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Michael Rabbat, Inderjit Dhillon

We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative).

Contrastive Learning Pseudo Label

Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

no code implementations23 May 2022 Tongzheng Ren, Fuheng Cui, Sujay Sanghavi, Nhat Ho

However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice.

Open-Ended Question Answering

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

no code implementations16 May 2022 Nhat Ho, Tongzheng Ren, Sujay Sanghavi, Purnamrita Sarkar, Rachel Ward

Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings.

Minimax Regret for Cascading Bandits

no code implementations23 Mar 2022 Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.


Sample Efficiency of Data Augmentation Consistency Regularization

no code implementations24 Feb 2022 Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S. Dhillon, Sujay Sanghavi, Qi Lei

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data.

Data Augmentation Generalization Bounds

Improving Computational Complexity in Statistical Models with Second-Order Information

no code implementations9 Feb 2022 Tongzheng Ren, Jiacheng Zhuo, Sujay Sanghavi, Nhat Ho

This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius.

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

no code implementations15 Oct 2021 Tongzheng Ren, Fuheng Cui, Alexia Atsidakou, Sujay Sanghavi, Nhat Ho

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions.

Theoretical Analysis of Consistency Regularization with Limited Augmented Data

no code implementations29 Sep 2021 Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S Dhillon, Sujay Sanghavi, Qi Lei

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data.

Data Augmentation Generalization Bounds +1

Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

2 code implementations16 Jun 2021 Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0. 5.

Ranked #19 on Image Classification on MNIST (Accuracy metric)

Image Classification Vocal Bursts Intensity Prediction

On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

no code implementations13 Jun 2021 Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

The primary reason for this is that the clipping operation (i. e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy.

Benchmarking Federated Learning +1

Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

no code implementations1 Jun 2021 Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon

For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels.

Extreme Multi-Label Classification TAG

Nearly Horizon-Free Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.

reinforcement-learning Reinforcement Learning (RL)

Linear Bandit Algorithms with Sublinear Time Complexity

no code implementations3 Mar 2021 Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\tilde O(\sqrt{T})$ regret.

Movie Recommendation

Combinatorial Bandits without Total Order for Arms

no code implementations3 Mar 2021 Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

Specifically, we focus on a challenging setting where 1) the reward distribution of an arm depends on the set $s$ it is part of, and crucially 2) there is \textit{no total order} for the arms in $\mathcal{A}$.

Faster Non-Convex Federated Learning via Global and Local Momentum

no code implementations7 Dec 2020 Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1. 5})$ to converge to an $\epsilon$-stationary point (i. e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works.

Federated Learning

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

no code implementations28 Nov 2020 Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.


On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

1 code implementation20 Nov 2020 Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.

Extreme Multi-label Classification from Aggregated Labels

no code implementations ICML 2020 Yanyao Shen, Hsiang-Fu Yu, Sujay Sanghavi, Inderjit Dhillon

Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes.

Classification Extreme Multi-Label Classification +1

Choosing the Sample with Lowest Loss makes SGD Robust

1 code implementation10 Jan 2020 Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi

The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD).


Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space

no code implementations NeurIPS 2019 Shuo Yang, Yanyao Shen, Sujay Sanghavi

In this paper, we provide a new algorithm - Interaction Hard Thresholding (IntHT) which is the first one to provably accurately solve this problem in sub-quadratic time and space.


Learning Distributions Generated by One-Layer ReLU Networks

1 code implementation NeurIPS 2019 Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi

We give a simple algorithm to estimate the parameters (i. e., the weight matrix and bias vector of the ReLU neural network) up to an error $\epsilon||W||_F$ using $\tilde{O}(1/\epsilon^2)$ samples and $\tilde{O}(d^2/\epsilon^2)$ time (log factors are ignored for simplicity).

Blocking Bandits

no code implementations NeurIPS 2019 Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai

We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem.

Blocking Product Recommendation +1

Iterative Least Trimmed Squares for Mixed Linear Regression

no code implementations NeurIPS 2019 Yanyao Shen, Sujay Sanghavi

We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity.


PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

1 code implementation26 Jan 2019 Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.

Minimum weight norm models do not always generalize well for over-parameterized problems

no code implementations16 Nov 2018 Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.

Learning with Bad Training Data via Iterative Trimmed Loss Minimization

no code implementations28 Oct 2018 Yanyao Shen, Sujay Sanghavi

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted.

Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models

1 code implementation NeurIPS 2019 Shanshan Wu, Sujay Sanghavi, Alexandros G. Dimakis

We show that this algorithm can recover any arbitrary discrete pairwise graphical model, and also characterize its sample complexity as a function of model width, alphabet size, edge parameter accuracy, and the number of variables.


Iteratively Learning from the Best

no code implementations27 Sep 2018 Yanyao Shen, Sujay Sanghavi

We study a simple generic framework to address the issue of bad training data; both bad labels in supervised problems, and bad samples in unsupervised ones.

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

1 code implementation26 Jun 2018 Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.

Extreme Multi-Label Classification Multi-Label Learning +1

Sparse Quadratic Logistic Regression in Sub-quadratic Time

no code implementations8 Mar 2017 Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi

We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$.

regression Test

Normalized Spectral Map Synchronization

no code implementations NeurIPS 2016 Yanyao Shen, Qi-Xing Huang, Nati Srebro, Sujay Sanghavi

The algorithmic advancement of synchronizing maps is important in order to solve a wide range of practice problems with possible large-scale dataset.

Single Pass PCA of Matrix Products

1 code implementation NeurIPS 2016 Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$.

The Search Problem in Mixture Models

no code implementations4 Oct 2016 Avik Ray, Joe Neeman, Sujay Sanghavi, Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models.

Clustering Topic Models

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

no code implementations12 Sep 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.

Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization

no code implementations19 Aug 2016 Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

We give a tractable algorithm for the mixed linear equation problem, and show that under some technical conditions, our algorithm is guaranteed to solve the problem exactly with sample complexity linear in the dimension, and polynomial in $k$, the number of components.

Tensor Decomposition

Finding Low-Rank Solutions via Non-Convex Matrix Factorization, Efficiently and Provably

no code implementations10 Jun 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.

Trading-off variance and complexity in stochastic gradient descent

no code implementations22 Mar 2016 Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.

Dropping Convexity for Faster Semi-definite Optimization

no code implementations14 Sep 2015 Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons

1 code implementation16 Jul 2015 Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, Inderjit S. Dhillon

In this paper we consider the collaborative ranking setting: a pool of users each provides a small number of pairwise preferences between $d$ possible items; from these we need to predict preferences of the users for items they have not yet seen.

Collaborative Filtering Collaborative Ranking +1

The local convexity of solving systems of quadratic equations

no code implementations25 Jun 2015 Chris D. White, Sujay Sanghavi, Rachel Ward

This paper considers the recovery of a rank $r$ positive semidefinite matrix $X X^T\in\mathbb{R}^{n\times n}$ from $m$ scalar measurements of the form $y_i := a_i^T X X^T a_i$ (i. e., quadratic measurements of $X$).

Quantum State Tomography

A New Sampling Technique for Tensors

no code implementations17 Feb 2015 Srinadh Bhojanapalli, Sujay Sanghavi

In this paper we propose new techniques to sample arbitrary third-order tensors, with an objective of speeding up tensor algorithms that have recently gained popularity in machine learning.

Online Collaborative-Filtering on Graphs

no code implementations7 Nov 2014 Siddhartha Banerjee, Sujay Sanghavi, Sanjay Shakkottai

We consider this problem under a simple natural model, wherein the number of items and the number of item-views are of the same order, and an `access-graph' constrains which user is allowed to see which item.

Collaborative Filtering Recommendation Systems

Greedy Subspace Clustering

no code implementations NeurIPS 2014 Dohyung Park, Constantine Caramanis, Sujay Sanghavi

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces.

Clustering Face Clustering +1

Non-convex Robust PCA

no code implementations NeurIPS 2014 Praneeth Netrapalli, U. N. Niranjan, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain

In contrast, existing methods for robust PCA, which are based on convex optimization, have $O(m^2n)$ complexity per iteration, and take $O(1/\epsilon)$ iterations, i. e., exponentially more iterations for the same accuracy.

Tighter Low-rank Approximation via Sampling the Leveraged Element

1 code implementation14 Oct 2014 Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi

The first is a new method to directly compute a low-rank approximation (in efficient factored form) to the product of two given matrices; it computes a small random set of entries of the product, and then executes weighted alternating minimization (as before) on these.

Alternating Minimization for Mixed Linear Regression

no code implementations14 Oct 2013 Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

Mixed linear regression involves the recovery of two (or more) unknown vectors from unlabeled linear measurements; that is, where each sample comes from exactly one of the vectors, but we do not know which one.


Completing Any Low-rank Matrix, Provably

no code implementations12 Jun 2013 Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward

Matrix completion, i. e., the exact and provable recovery of a low-rank matrix from a small subset of its elements, is currently only known to be possible if the matrix satisfies a restrictive structural constraint---known as {\em incoherence}---on its row and column spaces.

Matrix Completion

Phase Retrieval using Alternating Minimization

1 code implementation NeurIPS 2013 Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi

Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on "lifting" to a convex matrix problem) in sample complexity and robustness to noise.


Clustering Sparse Graphs

no code implementations NeurIPS 2012 Yudong Chen, Sujay Sanghavi, Huan Xu

We develop a new algorithm to cluster sparse unweighted graphs -- i. e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters.

Clustering Stochastic Block Model

Improved Graph Clustering

no code implementations11 Oct 2012 Yudong Chen, Sujay Sanghavi, Huan Xu

We show that, in the classic stochastic block model setting, it outperforms existing methods by polynomial factors when the cluster size is allowed to have general scalings.

Clustering Graph Clustering +1

Clustering Partially Observed Graphs via Convex Optimization

no code implementations25 Apr 2011 Yudong Chen, Ali Jalali, Sujay Sanghavi, Huan Xu

This paper considers the problem of clustering a partially observed unweighted graph---i. e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge.

Clustering Stochastic Block Model

Matrix completion with column manipulation: Near-optimal sample-robustness-rank tradeoffs

no code implementations10 Feb 2011 Yudong Chen, Huan Xu, Constantine Caramanis, Sujay Sanghavi

Moreover, we show by an information-theoretic argument that our guarantees are nearly optimal in terms of the fraction of sampled entries on the authentic columns, the fraction of corrupted columns, and the rank of the underlying matrix.

Collaborative Filtering Matrix Completion

A Dirty Model for Multi-task Learning

no code implementations NeurIPS 2010 Ali Jalali, Sujay Sanghavi, Chao Ruan, Pradeep K. Ravikumar

However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks.

Multi-Task Learning regression

Robust PCA via Outlier Pursuit

1 code implementation NeurIPS 2010 Huan Xu, Constantine Caramanis, Sujay Sanghavi

Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers.

Collaborative Filtering Dimensionality Reduction +1

Linear programming analysis of loopy belief propagation for weighted matching

no code implementations NeurIPS 2007 Sujay Sanghavi, Dmitry Malioutov, Alan S. Willsky

Loopy belief propagation has been employed in a wide variety of applications with great empirical success, but it comes with few theoretical guarantees.

Cannot find the paper you are looking for? You can Submit a new open access paper.