Search Results for author: Samet Oymak

Found 65 papers, 15 papers with code

TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection

no code implementations17 Apr 2024 Xuechen Zhang, Zijian Huang, Ege Onur Taga, Carlee Joe-Wong, Samet Oymak, Jiasi Chen

Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers.

GSM8K Navigate

Mechanics of Next Token Prediction with Self-Attention

no code implementations12 Mar 2024 Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.

Retrieval

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

no code implementations21 Feb 2024 M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.

Text Generation

FLASH: Federated Learning Across Simultaneous Heterogeneities

no code implementations13 Feb 2024 Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data.

Federated Learning Multi-Armed Bandits

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

2 code implementations6 Feb 2024 Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention.

In-Context Learning Language Modelling +1

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

no code implementations25 Jan 2024 Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes.

Attribute Fairness

Plug-and-Play Transformer Modules for Test-Time Adaptation

no code implementations6 Jan 2024 Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model.

Test-time Adaptation Visual Prompt Tuning

MeTA: Multi-source Test Time Adaptation

no code implementations4 Jan 2024 Sk Miraj Ahmed, Fahim Faisal Niloy, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

Test time adaptation is the process of adapting, in an unsupervised manner, a pre-trained source model to each incoming batch of the test data (i. e., without requiring a substantial portion of the test data to be available, as in traditional domain adaptation) and without access to the source data.

Test-time Adaptation

Noise in the reverse process improves the approximation capabilities of diffusion models

no code implementations13 Dec 2023 Karthik Elamvazhuthi, Samet Oymak, Fabio Pasqualetti

We use a control theoretic perspective by posing the approximation of the reverse process as a trajectory tracking problem.

Effective Restoration of Source Knowledge in Continual Test Time Adaptation

no code implementations8 Nov 2023 Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

By restoring the knowledge from the source, it effectively corrects the negative consequences arising from the gradual deterioration of model parameters caused by ongoing shifts in the domain.

Change Detection Test-time Adaptation

Transformers as Support Vector Machines

1 code implementation31 Aug 2023 Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.

Can Transformers Learn Optimal Filtering for Unknown Systems?

1 code implementation16 Aug 2023 Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay

Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics.

FedYolo: Augmenting Federated Learning with Pretrained Transformers

no code implementations10 Jul 2023 Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.

Federated Learning

Max-Margin Token Selection in Attention Mechanism

1 code implementation NeurIPS 2023 Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.

On the Role of Attention in Prompt-tuning

no code implementations6 Jun 2023 Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.

Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

1 code implementation2 Jun 2023 Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak

Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods).

Bilevel Optimization

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs

no code implementations15 May 2023 Karthik Elamvazhuthi, Xuechen Zhang, Samet Oymak, Fabio Pasqualetti

To address this shortcoming, in this paper we study a class of neural ordinary differential equations that, by design, leave a given manifold invariant, and characterize their properties by leveraging the controllability properties of control affine systems.

Provable Pathways: Learning Multiple Tasks over Multiple Paths

no code implementations8 Mar 2023 Yingcong Li, Samet Oymak

A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.

Generalization Bounds

Stochastic Contextual Bandits with Long Horizon Rewards

no code implementations2 Feb 2023 Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak

The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.

Decision Making Language Modelling +1

Transformers as Algorithms: Generalization and Stability in In-context Learning

2 code implementations17 Jan 2023 Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.

Generalization Bounds In-Context Learning +3

Finite Sample Identification of Bilinear Dynamical Systems

no code implementations29 Aug 2022 Yahya Sattar, Samet Oymak, Necmiye Ozay

This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs.

System Identification via Nuclear Norm Regularization

1 code implementation30 Mar 2022 Yue Sun, Samet Oymak, Maryam Fazel

Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system.

Model Selection

Provable and Efficient Continual Representation Learning

1 code implementation3 Mar 2022 Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.

Continual Learning Representation Learning

Towards Sample-efficient Overparameterized Meta-learning

1 code implementation NeurIPS 2021 Yue Sun, Adhyyan Narang, Halil Ibrahim Gulluk, Samet Oymak, Maryam Fazel

Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias.

Few-Shot Learning Inductive Bias

AutoBalance: Optimized Loss Functions for Imbalanced Data

1 code implementation NeurIPS 2021 Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split.

Data Augmentation Fairness

Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds

no code implementations13 Nov 2021 Yahya Sattar, Zhe Du, Davoud Ataee Tarzanagh, Laura Balzano, Necmiye Ozay, Samet Oymak

Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system.

Post-hoc Models for Performance Estimation of Machine Learning Inference

no code implementations6 Oct 2021 Xuechen Zhang, Samet Oymak, Jiasi Chen

Estimating how well a machine learning model performs during inference is critical in a variety of scenarios (for example, to quantify uncertainty, or to choose from a library of available models).

BIG-bench Machine Learning Feature Engineering +3

Certainty Equivalent Quadratic Control for Markov Jump Systems

no code implementations26 May 2021 Zhe Du, Yahya Sattar, Davoud Ataee Tarzanagh, Laura Balzano, Samet Oymak, Necmiye Ozay

Real-world control applications often involve complex dynamics subject to abrupt changes or variations.

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

no code implementations29 Apr 2021 Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem).

Bilevel Optimization Generalization Bounds +2

Unsupervised Multi-source Domain Adaptation Without Access to Source Data

1 code implementation CVPR 2021 Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury

A recent line of work addressed this problem and proposed an algorithm that transfers knowledge to the unlabeled target domain from a single source model without requiring access to the source data.

Unsupervised Domain Adaptation

Provable Super-Convergence with a Large Cyclical Learning Rate

no code implementations22 Feb 2021 Samet Oymak

Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up.

Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning

no code implementations14 Feb 2021 Halil Ibrahim Gulluk, Yue Sun, Samet Oymak, Maryam Fazel

We prove that subspace-based representations can be learned in a sample-efficient manner and provably benefit future tasks in terms of sample complexity.

Binary Classification General Classification +2

On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

no code implementations16 Nov 2020 Yao-Chun Chan, Mingchen Li, Samet Oymak

In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks.

Active Learning Contrastive Learning +1

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

no code implementations NeurIPS 2020 Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.

Binary Classification Classification +2

Unsupervised Paraphrasing via Deep Reinforcement Learning

no code implementations5 Jul 2020 A. B. Siddique, Samet Oymak, Vagelis Hristidis

Our evaluation also shows that PUP achieves a great trade-off between semantic similarity and diversity of expression.

Image Captioning Paraphrase Generation +5

Exploring Weight Importance and Hessian Bias in Model Pruning

no code implementations19 Jun 2020 Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models.

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

no code implementations19 Jun 2020 Samet Oymak, Talha Cihad Gulcu

We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data and weak supervision.

Clustering

On the Role of Dataset Quality and Heterogeneity in Model Confidence

no code implementations23 Feb 2020 Yuan Zhao, Jiasi Chen, Samet Oymak

We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms.

Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems

no code implementations20 Feb 2020 Yahya Sattar, Samet Oymak

If the system is run by a stabilizing input policy, we show that temporally-dependent samples can be approximated by i. i. d.

GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN

no code implementations25 Sep 2019 Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Quickly Finding the Best Linear Model in High Dimensions

no code implementations3 Jul 2019 Yahya Sattar, Samet Oymak

We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime.

Vocal Bursts Intensity Prediction

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

no code implementations12 Jun 2019 Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

1 code implementation27 Mar 2019 Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak

In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels.

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

no code implementations12 Feb 2019 Samet Oymak, Mahdi Soltanolkotabi

However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor.

Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

no code implementations25 Dec 2018 Samet Oymak, Mahdi Soltanolkotabi

In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optima.

Stochastic Gradient Descent Learns State Equations with Nonlinear Activations

no code implementations ICLR 2019 Samet Oymak

We study discrete time dynamical systems governed by the state equation $h_{t+1}=\phi(Ah_t+Bu_t)$.

Non-asymptotic Identification of LTI Systems from a Single Trajectory

1 code implementation14 Jun 2018 Samet Oymak, Necmiye Ozay

We consider the problem of learning a realization for a linear time-invariant (LTI) dynamical system from input/output data.

End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition

no code implementations16 May 2018 Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of learning the weights of a deep convolutional neural network.

Tensor Decomposition

Learning Compact Neural Networks with Regularization

no code implementations ICML 2018 Samet Oymak

Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient.

Network Pruning

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

no code implementations20 May 2017 Samet Oymak, Mehrdad Mahdavi, Jiasi Chen

Evaluations on synthetic and real datasets demonstrate that algorithm is competitive with current state-of-the-art and accurately learns feature nonlinearities.

regression

Fast and Reliable Parameter Estimation from Nonlinear Observations

no code implementations23 Oct 2016 Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of recovering a structured but unknown parameter ${\bf{\theta}}^*$ from $n$ nonlinear observations of the form $y_i=f(\langle {\bf{x}}_i,{\bf{\theta}}^*\rangle)$ for $i=1, 2,\ldots, n$.

Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets

no code implementations14 Dec 2015 Samet Oymak, Ben Recht

We characterize the tradeoff between distortion and sample complexity $m$ in terms of the Gaussian width $\omega(K)$ of the set.

Universality laws for randomized dimension reduction, with applications

no code implementations30 Nov 2015 Samet Oymak, Joel A. Tropp

In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data.

Dimensionality Reduction

Parallel Correlation Clustering on Big Graphs

no code implementations NeurIPS 2015 Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.

Clustering

Sharp Time--Data Tradeoffs for Linear Inverse Problems

no code implementations16 Jul 2015 Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function.

Isometric sketching of any set via the Restricted Isometry Property

no code implementations11 Jun 2015 Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

In this paper we show that for the purposes of dimensionality reduction certain class of structured random matrices behave similarly to random Gaussian matrices.

Dimensionality Reduction

The Squared-Error of Generalized LASSO: A Precise Analysis

no code implementations4 Nov 2013 Samet Oymak, Christos Thrampoulidis, Babak Hassibi

The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$.

Cannot find the paper you are looking for? You can Submit a new open access paper.