Search Results for author: Samet Oymak

Found 65 papers, 15 papers with code

TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection

no code implementations • 17 Apr 2024 • Xuechen Zhang, Zijian Huang, Ege Onur Taga, Carlee Joe-Wong, Samet Oymak, Jiasi Chen

Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers.

GSM8K Navigate

Paper
Add Code

Mechanics of Next Token Prediction with Self-Attention

no code implementations • 12 Mar 2024 • Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.

Retrieval

Paper
Add Code

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

no code implementations • 21 Feb 2024 • M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.

Text Generation

Paper
Add Code

FLASH: Federated Learning Across Simultaneous Heterogeneities

no code implementations • 13 Feb 2024 • Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data.

Federated Learning Multi-Armed Bandits

Paper
Add Code

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

2 code implementations • 6 Feb 2024 • Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention.

In-Context Learning Language Modelling +1

162

Paper
Code

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

no code implementations • 25 Jan 2024 • Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes.

Attribute Fairness

Paper
Add Code

Plug-and-Play Transformer Modules for Test-Time Adaptation

no code implementations • 6 Jan 2024 • Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model.

Test-time Adaptation Visual Prompt Tuning

Paper
Add Code

MeTA: Multi-source Test Time Adaptation

no code implementations • 4 Jan 2024 • Sk Miraj Ahmed, Fahim Faisal Niloy, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

Test time adaptation is the process of adapting, in an unsupervised manner, a pre-trained source model to each incoming batch of the test data (i. e., without requiring a substantial portion of the test data to be available, as in traditional domain adaptation) and without access to the source data.

Test-time Adaptation

Paper
Add Code

Noise in the reverse process improves the approximation capabilities of diffusion models

no code implementations • 13 Dec 2023 • Karthik Elamvazhuthi, Samet Oymak, Fabio Pasqualetti

We use a control theoretic perspective by posing the approximation of the reverse process as a trajectory tracking problem.

Paper
Add Code

Effective Restoration of Source Knowledge in Continual Test Time Adaptation

no code implementations • 8 Nov 2023 • Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

By restoring the knowledge from the source, it effectively corrects the negative consequences arising from the gradual deterioration of model parameters caused by ongoing shifts in the domain.

Change Detection Test-time Adaptation

Paper
Add Code

Transformers as Support Vector Machines

1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.

Paper
Code

Can Transformers Learn Optimal Filtering for Unknown Systems?

1 code implementation • 16 Aug 2023 • Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay

Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics.

Paper
Code

FedYolo: Augmenting Federated Learning with Pretrained Transformers

no code implementations • 10 Jul 2023 • Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.

Federated Learning

Paper
Add Code

Max-Margin Token Selection in Attention Mechanism

1 code implementation • NeurIPS 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.

Paper
Code

On the Role of Attention in Prompt-tuning

no code implementations • 6 Jun 2023 • Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.

Paper
Add Code

Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

1 code implementation • 2 Jun 2023 • Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak

Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods).

Bilevel Optimization

Paper
Code

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs

no code implementations • 15 May 2023 • Karthik Elamvazhuthi, Xuechen Zhang, Samet Oymak, Fabio Pasqualetti

To address this shortcoming, in this paper we study a class of neural ordinary differential equations that, by design, leave a given manifold invariant, and characterize their properties by leveraging the controllability properties of control affine systems.

Paper
Add Code

Provable Pathways: Learning Multiple Tasks over Multiple Paths

no code implementations • 8 Mar 2023 • Yingcong Li, Samet Oymak

A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.

Generalization Bounds

Paper
Add Code

Stochastic Contextual Bandits with Long Horizon Rewards

no code implementations • 2 Feb 2023 • Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak

The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.

Decision Making Language Modelling +1

Paper
Add Code

Transformers as Algorithms: Generalization and Stability in In-context Learning

2 code implementations • 17 Jan 2023 • Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.

Generalization Bounds In-Context Learning +3

Paper
Code

Finite Sample Identification of Bilinear Dynamical Systems

no code implementations • 29 Aug 2022 • Yahya Sattar, Samet Oymak, Necmiye Ozay

This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs.

Paper
Add Code

Representation Learning for Context-Dependent Decision-Making

no code implementations • 12 May 2022 • Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti

Humans are capable of adjusting to changing environments flexibly and quickly.

Decision Making Q-Learning +1

Paper
Add Code

FedNest: Federated Bilevel, Minimax, and Compositional Optimization

3 code implementations • 4 May 2022 • Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak

Standard federated optimization methods successfully apply to stochastic problems with single-level structure.

Adversarial Robustness Hyperparameter Optimization +1

Paper
Code

System Identification via Nuclear Norm Regularization

1 code implementation • 30 Mar 2022 • Yue Sun, Samet Oymak, Maryam Fazel

Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system.

Model Selection

Paper
Code

Provable and Efficient Continual Representation Learning

1 code implementation • 3 Mar 2022 • Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.

Continual Learning Representation Learning

Paper
Code

Towards Sample-efficient Overparameterized Meta-learning

1 code implementation • NeurIPS 2021 • Yue Sun, Adhyyan Narang, Halil Ibrahim Gulluk, Samet Oymak, Maryam Fazel

Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias.

Few-Shot Learning Inductive Bias

Paper
Code

Non-Stationary Representation Learning in Sequential Linear Bandits

no code implementations • 13 Jan 2022 • Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti

In this paper, we study representation learning for multi-task decision-making in non-stationary environments.

Decision Making Representation Learning

Paper
Add Code

AutoBalance: Optimized Loss Functions for Imbalanced Data

1 code implementation • NeurIPS 2021 • Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split.

Data Augmentation Fairness

Paper
Code

Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds

no code implementations • 13 Nov 2021 • Yahya Sattar, Zhe Du, Davoud Ataee Tarzanagh, Laura Balzano, Necmiye Ozay, Samet Oymak

Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system.

Paper
Add Code

Post-hoc Models for Performance Estimation of Machine Learning Inference

no code implementations • 6 Oct 2021 • Xuechen Zhang, Samet Oymak, Jiasi Chen

Estimating how well a machine learning model performs during inference is critical in a variety of scenarios (for example, to quantify uncertainty, or to choose from a library of available models).

BIG-bench Machine Learning Feature Engineering +3

Paper
Add Code

Certainty Equivalent Quadratic Control for Markov Jump Systems

no code implementations • 26 May 2021 • Zhe Du, Yahya Sattar, Davoud Ataee Tarzanagh, Laura Balzano, Samet Oymak, Necmiye Ozay

Real-world control applications often involve complex dynamics subject to abrupt changes or variations.

Paper
Add Code

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

no code implementations • 29 Apr 2021 • Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem).

Bilevel Optimization Generalization Bounds +2

Paper
Add Code

Unsupervised Multi-source Domain Adaptation Without Access to Source Data

1 code implementation • CVPR 2021 • Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury

A recent line of work addressed this problem and proposed an algorithm that transfers knowledge to the unlabeled target domain from a single source model without requiring access to the source data.

Unsupervised Domain Adaptation

Paper
Code

Label-Imbalanced and Group-Sensitive Classification under Overparameterization

1 code implementation • NeurIPS 2021 • Ganesh Ramachandra Kini, Orestis Paraskevas, Samet Oymak, Christos Thrampoulidis

The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity.

Classification General Classification

Paper
Code

Provable Super-Convergence with a Large Cyclical Learning Rate

no code implementations • 22 Feb 2021 • Samet Oymak

Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up.

Paper
Add Code

Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning

no code implementations • 14 Feb 2021 • Halil Ibrahim Gulluk, Yue Sun, Samet Oymak, Maryam Fazel

We prove that subspace-based representations can be learned in a sample-efficient manner and provably benefit future tasks in terms of sample complexity.

Binary Classification General Classification +2

Paper
Add Code

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

no code implementations • 16 Dec 2020 • Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Deep networks are typically trained with many more parameters than the size of the training dataset.

Model Compression

Paper
Add Code

On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

no code implementations • 16 Nov 2020 • Yao-Chun Chan, Mingchen Li, Samet Oymak

In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks.

Active Learning Contrastive Learning +1

Paper
Add Code

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

no code implementations • NeurIPS 2020 • Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.

Binary Classification Classification +2

Paper
Add Code

Unsupervised Paraphrasing via Deep Reinforcement Learning

no code implementations • 5 Jul 2020 • A. B. Siddique, Samet Oymak, Vagelis Hristidis

Our evaluation also shows that PUP achieves a great trade-off between semantic similarity and diversity of expression.

Image Captioning Paraphrase Generation +5

Paper
Add Code

Exploring Weight Importance and Hessian Bias in Model Pruning

no code implementations • 19 Jun 2020 • Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models.

Paper
Add Code

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

no code implementations • 19 Jun 2020 • Samet Oymak, Talha Cihad Gulcu

We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data and weak supervision.

Clustering

Paper
Add Code

Finite Sample System Identification: Improved Rates and the Role of Regularization

no code implementations • L4DC 2020 • Yue Sun, Samet Oymak, Maryam Fazel

This paper studies low-order linear system identification via regularized regression.

Paper
Add Code

On the Role of Dataset Quality and Heterogeneity in Model Confidence

no code implementations • 23 Feb 2020 • Yuan Zhao, Jiasi Chen, Samet Oymak

We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms.

Paper
Add Code

Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems

no code implementations • 20 Feb 2020 • Yahya Sattar, Samet Oymak

If the system is run by a stabilizing input policy, we show that temporally-dependent samples can be approximated by i. i. d.

Paper
Add Code

GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN

no code implementations • 25 Sep 2019 • Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Paper
Add Code

Quickly Finding the Best Linear Model in High Dimensions

no code implementations • 3 Jul 2019 • Yahya Sattar, Samet Oymak

We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime.

Vocal Bursts Intensity Prediction

Paper
Add Code

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

no code implementations • 12 Jun 2019 • Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Paper
Add Code

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

1 code implementation • 27 Mar 2019 • Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak

In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels.

Paper
Code

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

no code implementations • 12 Feb 2019 • Samet Oymak, Mahdi Soltanolkotabi

However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor.

Paper
Add Code

Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

no code implementations • 25 Dec 2018 • Samet Oymak, Mahdi Soltanolkotabi

In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optima.

Paper
Add Code

Stochastic Gradient Descent Learns State Equations with Nonlinear Activations

no code implementations • ICLR 2019 • Samet Oymak

We study discrete time dynamical systems governed by the state equation $h_{t+1}=\phi(Ah_t+Bu_t)$.

Paper
Add Code

Non-asymptotic Identification of LTI Systems from a Single Trajectory

1 code implementation • 14 Jun 2018 • Samet Oymak, Necmiye Ozay

We consider the problem of learning a realization for a linear time-invariant (LTI) dynamical system from input/output data.

Paper
Code

High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient

no code implementations • 11 Jun 2018 • Amir Asiaee, Samet Oymak, Kevin R. Coombes, Arindam Banerjee

We consider the problem of multi-task learning in the high dimensional setting.

Multi-Task Learning Vocal Bursts Intensity Prediction

Paper
Add Code

End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition

no code implementations • 16 May 2018 • Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of learning the weights of a deep convolutional neural network.

Tensor Decomposition

Paper
Add Code

Learning Compact Neural Networks with Regularization

no code implementations • ICML 2018 • Samet Oymak

Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient.

Network Pruning

Paper
Add Code

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

no code implementations • 20 May 2017 • Samet Oymak, Mehrdad Mahdavi, Jiasi Chen

Evaluations on synthetic and real datasets demonstrate that algorithm is competitive with current state-of-the-art and accurately learns feature nonlinearities.

regression

Paper
Add Code

Fast and Reliable Parameter Estimation from Nonlinear Observations

no code implementations • 23 Oct 2016 • Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of recovering a structured but unknown parameter ${\bf{\theta}}^*$ from $n$ nonlinear observations of the form $y_i=f(\langle {\bf{x}}_i,{\bf{\theta}}^*\rangle)$ for $i=1, 2,\ldots, n$.

Paper
Add Code

Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets

no code implementations • 14 Dec 2015 • Samet Oymak, Ben Recht

We characterize the tradeoff between distortion and sample complexity $m$ in terms of the Gaussian width $\omega(K)$ of the set.

Paper
Add Code

Universality laws for randomized dimension reduction, with applications

no code implementations • 30 Nov 2015 • Samet Oymak, Joel A. Tropp

In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data.

Dimensionality Reduction

Paper
Add Code

Parallel Correlation Clustering on Big Graphs

no code implementations • NeurIPS 2015 • Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.

Clustering

Paper
Add Code

Sharp Time--Data Tradeoffs for Linear Inverse Problems

no code implementations • 16 Jul 2015 • Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function.

Paper
Add Code

Isometric sketching of any set via the Restricted Isometry Property

no code implementations • 11 Jun 2015 • Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

In this paper we show that for the purposes of dimensionality reduction certain class of structured random matrices behave similarly to random Gaussian matrices.

Dimensionality Reduction

Paper
Add Code

Graph Clustering With Missing Data: Convex Algorithms and Analysis

no code implementations • NeurIPS 2014 • Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi

We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed.

Clustering Graph Clustering +2

Paper
Add Code

The Squared-Error of Generalized LASSO: A Precise Analysis

no code implementations • 4 Nov 2013 • Samet Oymak, Christos Thrampoulidis, Babak Hassibi

The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.