no code implementations • 29 Feb 2024 • Tina Behnia, Christos Thrampoulidis
Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC).
no code implementations • 28 Feb 2024 • Christos Thrampoulidis
Specifically, for linear NTP models trained using gradient descent (GD), we make the following contributions: Firstly, we determine NTP-separability conditions on the data, under which GD can attain its lower bound.
no code implementations • 22 Feb 2024 • Wenlong Deng, Blair Chen, Xiaoxiao Li, Christos Thrampoulidis
We achieve fairness while maintaining utility trade-off by ensuring conditional independence between sensitive attributes and text embeddings conditioned on the content.
no code implementations • 8 Feb 2024 • Bhavya Vasudeva, Puneesh Deora, Christos Thrampoulidis
Self-attention, the core mechanism of transformers, distinguishes them from traditional neural networks and drives their outstanding performance.
no code implementations • 25 Jan 2024 • Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak
Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes.
1 code implementation • 27 Oct 2023 • Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li
Existing Generalized FL (GFL) and Personalized FL (PFL) methods have limitations in balancing performance across both global and local data distributions.
no code implementations • 19 Oct 2023 • Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis
Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model.
no code implementations • 2 Oct 2023 • Jaidev Gill, Vala Vakilian, Christos Thrampoulidis
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations.
1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak
In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.
no code implementations • 3 Aug 2023 • Liam Madden, Christos Thrampoulidis
In order to analyze general real analytic activations, we derive the precise generic rank of the network's Jacobian, which can be written in terms of Hadamard powers and the Khatri-Rao product.
no code implementations • 13 Jun 2023 • Ganesh Ramachandra Kini, Vala Vakilian, Tina Behnia, Jaidev Gill, Christos Thrampoulidis
Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification.
no code implementations • 6 Jun 2023 • Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis
Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.
1 code implementation • 3 Jun 2023 • Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis
Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive.
no code implementations • 22 May 2023 • Hossein Taheri, Christos Thrampoulidis
Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data.
no code implementations • 2 Apr 2023 • Chen Fan, Christos Thrampoulidis, Mark Schmidt
Modern machine learning models are often over-parameterized and as a result they can interpolate the training data.
1 code implementation • 14 Mar 2023 • Tina Behnia, Ganesh Ramachandra Kini, Vala Vakilian, Christos Thrampoulidis
Aiming to extend this theory to non-linear models, we investigate the implicit geometry of classifiers and embeddings that are learned by different CE parameterizations.
no code implementations • 18 Feb 2023 • Hossein Taheri, Christos Thrampoulidis
Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons.
1 code implementation • 1 Nov 2022 • Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao
In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e. g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks.
no code implementations • 15 Sep 2022 • Hossein Taheri, Christos Thrampoulidis
Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data.
no code implementations • 10 Aug 2022 • Christos Thrampoulidis, Ganesh R. Kini, Vala Vakilian, Tina Behnia
However, we caution that convergence worsens with increasing imbalances.
no code implementations • 25 Jun 2022 • Tina Behnia, Ke Wang, Christos Thrampoulidis
Overparameterized models fail to generalize well in the presence of data imbalance even when combined with traditional techniques for mitigating imbalances.
no code implementations • 25 May 2022 • Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan
Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question.
no code implementations • 12 May 2022 • Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh
In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments.
3 code implementations • 4 May 2022 • Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak
Standard federated optimization methods successfully apply to stochastic problems with single-level structure.
1 code implementation • NeurIPS 2021 • Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak
Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split.
1 code implementation • 20 Sep 2021 • Kabir Aladin Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis
In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting.
no code implementations • NeurIPS 2021 • Ke Wang, Vidya Muthukumar, Christos Thrampoulidis
The literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification; however, modern machine learning operates in the multiclass setting.
no code implementations • 11 Jun 2021 • Sanae Amani, Christos Thrampoulidis, Lin F. Yang
Safety in reinforcement learning has become increasingly important in recent years.
no code implementations • NeurIPS 2021 • Sanae Amani, Christos Thrampoulidis
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e. g., `click' vs `no-click').
1 code implementation • NeurIPS 2021 • Ganesh Ramachandra Kini, Orestis Paraskevas, Samet Oymak, Christos Thrampoulidis
The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity.
no code implementations • 16 Dec 2020 • Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis
Deep networks are typically trained with many more parameters than the size of the training dataset.
no code implementations • 1 Dec 2020 • Sanae Amani, Christos Thrampoulidis
For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
no code implementations • 18 Nov 2020 • Ke Wang, Christos Thrampoulidis
Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases.
no code implementations • NeurIPS 2020 • Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi
Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.
no code implementations • 26 Oct 2020 • Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis
It has been consistently reported that many machine learning models are susceptible to adversarial attacks i. e., small additive adversarial perturbations applied to data points can cause misclassification.
no code implementations • NeurIPS 2020 • Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh
For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively.
no code implementations • 7 Aug 2020 • Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan
This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$.
no code implementations • 19 Jun 2020 • Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak
Model pruning is an essential procedure for building compact and computationally-efficient machine learning models.
no code implementations • 16 Jun 2020 • Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis
For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter.
no code implementations • L4DC 2020 • Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints.
no code implementations • 5 May 2020 • Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints.
no code implementations • 17 Feb 2020 • Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis
We study convex empirical risk minimization for high-dimensional inference in binary models.
no code implementations • 30 Jan 2020 • Ganesh Kini, Christos Thrampoulidis
In a recent paper [Zeyu, Kammoun, Thrampoulidis, 2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD.
no code implementations • 13 Nov 2019 • Zeyu Deng, Abla Kammoun, Christos Thrampoulidis
We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples.
no code implementations • 6 Nov 2019 • Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.
no code implementations • NeurIPS 2019 • Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set.
no code implementations • 12 Aug 2019 • Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis
We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions.
no code implementations • 11 Dec 2017 • Christos Thrampoulidis, Ankit Singh Rawat
Unfortunately, both least-squares and the Lasso fail to recover $\mathbf{x}_0$ when $\mu_\ell=0$.
no code implementations • NeurIPS 2015 • Christos Thrampoulidis, Ehsan Abbasi, Babak Hassibi
In this work, we considerably strengthen these results by obtaining explicit expressions for $\|\hat x-\mu x_0\|_2$, for the regularized Generalized-LASSO, that are asymptotically precise when $m$ and $n$ grow large.
no code implementations • 4 Nov 2013 • Samet Oymak, Christos Thrampoulidis, Babak Hassibi
The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$.