Search Results for author: Christos Thrampoulidis

Found 50 papers, 9 papers with code

Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features

no code implementations29 Feb 2024 Tina Behnia, Christos Thrampoulidis

Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC).

Representation Learning

Implicit Bias of Next-Token Prediction

no code implementations28 Feb 2024 Christos Thrampoulidis

Specifically, for linear NTP models trained using gradient descent (GD), we make the following contributions: Firstly, we determine NTP-separability conditions on the data, under which GD can attain its lower bound.

Content Conditional Debiasing for Fair Text Embedding

no code implementations22 Feb 2024 Wenlong Deng, Blair Chen, Xiaoxiao Li, Christos Thrampoulidis

We achieve fairness while maintaining utility trade-off by ensuring conditional independence between sensitive attributes and text embeddings conditioned on the content.

Fairness

Implicit Bias and Fast Convergence Rates for Self-attention

no code implementations8 Feb 2024 Bhavya Vasudeva, Puneesh Deora, Christos Thrampoulidis

Self-attention, the core mechanism of transformers, distinguishes them from traditional neural networks and drives their outstanding performance.

Binary Classification regression

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

no code implementations25 Jan 2024 Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes.

Attribute Fairness

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

1 code implementation27 Oct 2023 Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li

Existing Generalized FL (GFL) and Personalized FL (PFL) methods have limitations in balancing performance across both global and local data distributions.

Personalized Federated Learning Visual Prompt Tuning

On the Optimization and Generalization of Multi-head Attention

no code implementations19 Oct 2023 Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis

Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model.

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss

no code implementations2 Oct 2023 Jaidev Gill, Vala Vakilian, Christos Thrampoulidis

Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations.

Transformers as Support Vector Machines

1 code implementation31 Aug 2023 Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.

Memory capacity of two layer neural networks with smooth activations

no code implementations3 Aug 2023 Liam Madden, Christos Thrampoulidis

In order to analyze general real analytic activations, we derive the precise generic rank of the network's Jacobian, which can be written in terms of Hadamard powers and the Khatri-Rao product.

Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching

no code implementations13 Jun 2023 Ganesh Ramachandra Kini, Vala Vakilian, Tina Behnia, Jaidev Gill, Christos Thrampoulidis

Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification.

On the Role of Attention in Prompt-tuning

no code implementations6 Jun 2023 Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.

Memorization Capacity of Multi-Head Attention in Transformers

1 code implementation3 Jun 2023 Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis

Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive.

Image Classification Memorization +1

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

no code implementations22 May 2023 Hossein Taheri, Christos Thrampoulidis

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data.

Generalization Bounds

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

no code implementations2 Apr 2023 Chen Fan, Christos Thrampoulidis, Mark Schmidt

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data.

On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data

1 code implementation14 Mar 2023 Tina Behnia, Ganesh Ramachandra Kini, Vala Vakilian, Christos Thrampoulidis

Aiming to extend this theory to non-linear models, we investigate the implicit geometry of classifiers and embeddings that are learned by different CE parameterizations.

Generalization and Stability of Interpolating Neural Networks with Minimal Width

no code implementations18 Feb 2023 Hossein Taheri, Christos Thrampoulidis

Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons.

Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks

1 code implementation1 Nov 2022 Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao

In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e. g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks.

Data Augmentation Out-of-Distribution Generalization

On Generalization of Decentralized Learning with Separable Data

no code implementations15 Sep 2022 Hossein Taheri, Christos Thrampoulidis

Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data.

Generalization Bounds

On how to avoid exacerbating spurious correlations when models are overparameterized

no code implementations25 Jun 2022 Tina Behnia, Ke Wang, Christos Thrampoulidis

Overparameterized models fail to generalize well in the presence of data imbalance even when combined with traditional techniques for mitigating imbalances.

Generalization Bounds imbalanced classification

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

no code implementations25 May 2022 Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan

Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question.

Open-Ended Question Answering

Multi-Environment Meta-Learning in Stochastic Linear Bandits

no code implementations12 May 2022 Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments.

Meta-Learning

AutoBalance: Optimized Loss Functions for Imbalanced Data

1 code implementation NeurIPS 2021 Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split.

Data Augmentation Fairness

Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

1 code implementation20 Sep 2021 Kabir Aladin Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis

In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting.

Retrieval

Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation

no code implementations NeurIPS 2021 Ke Wang, Vidya Muthukumar, Christos Thrampoulidis

The literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification; however, modern machine learning operates in the multiclass setting.

Binary Classification Classification

UCB-based Algorithms for Multinomial Logistic Regression Bandits

no code implementations NeurIPS 2021 Sanae Amani, Christos Thrampoulidis

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e. g., `click' vs `no-click').

regression

Decentralized Multi-Agent Linear Bandits with Safety Constraints

no code implementations1 Dec 2020 Sanae Amani, Christos Thrampoulidis

For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.

Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization

no code implementations18 Nov 2020 Ke Wang, Christos Thrampoulidis

Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases.

Binary Classification General Classification

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

no code implementations NeurIPS 2020 Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.

Binary Classification Classification +2

Asymptotic Behavior of Adversarial Training in Binary Classification

no code implementations26 Oct 2020 Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

It has been consistently reported that many machine learning models are susceptible to adversarial attacks i. e., small additive adversarial perturbations applied to data points can cause misclassification.

Binary Classification Classification +1

Stage-wise Conservative Linear Bandits

no code implementations NeurIPS 2020 Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively.

Thompson Sampling

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

no code implementations7 Aug 2020 Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan

This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$.

Exploring Weight Importance and Hessian Bias in Model Pruning

no code implementations19 Jun 2020 Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models.

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

no code implementations16 Jun 2020 Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter.

Vocal Bursts Intensity Prediction

Regret Bound for Safe Gaussian Process Bandit Optimization

no code implementations L4DC 2020 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints.

Gaussian Processes

Regret Bounds for Safe Gaussian Process Bandit Optimization

no code implementations5 May 2020 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints.

Gaussian Processes

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

no code implementations17 Feb 2020 Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

We study convex empirical risk minimization for high-dimensional inference in binary models.

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

no code implementations30 Jan 2020 Ganesh Kini, Christos Thrampoulidis

In a recent paper [Zeyu, Kammoun, Thrampoulidis, 2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD.

Binary Classification General Classification

A Model of Double Descent for High-dimensional Binary Linear Classification

no code implementations13 Nov 2019 Zeyu Deng, Abla Kammoun, Christos Thrampoulidis

We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples.

Classification General Classification +2

Safe Linear Thompson Sampling with Side Information

no code implementations6 Nov 2019 Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

Thompson Sampling

Linear Stochastic Bandits Under Safety Constraints

no code implementations NeurIPS 2019 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set.

Safe Exploration

Sharp Guarantees for Solving Random Equations with One-Bit Information

no code implementations12 Aug 2019 Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions.

Lifting high-dimensional nonlinear models with Gaussian regressors

no code implementations11 Dec 2017 Christos Thrampoulidis, Ankit Singh Rawat

Unfortunately, both least-squares and the Lasso fail to recover $\mathbf{x}_0$ when $\mu_\ell=0$.

Vocal Bursts Intensity Prediction

LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements

no code implementations NeurIPS 2015 Christos Thrampoulidis, Ehsan Abbasi, Babak Hassibi

In this work, we considerably strengthen these results by obtaining explicit expressions for $\|\hat x-\mu x_0\|_2$, for the regularized Generalized-LASSO, that are asymptotically precise when $m$ and $n$ grow large.

The Squared-Error of Generalized LASSO: A Precise Analysis

no code implementations4 Nov 2013 Samet Oymak, Christos Thrampoulidis, Babak Hassibi

The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$.

Cannot find the paper you are looking for? You can Submit a new open access paper.