no code implementations • 17 Apr 2024 • Xuechen Zhang, Zijian Huang, Ege Onur Taga, Carlee Joe-Wong, Samet Oymak, Jiasi Chen
Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers.
no code implementations • 12 Mar 2024 • Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak
}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.
no code implementations • 21 Feb 2024 • M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak
Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.
no code implementations • 13 Feb 2024 • Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury
The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data.
2 code implementations • 6 Feb 2024 • Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention.
no code implementations • 25 Jan 2024 • Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak
Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes.
no code implementations • 6 Jan 2024 • Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury
Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model.
no code implementations • 4 Jan 2024 • Sk Miraj Ahmed, Fahim Faisal Niloy, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury
Test time adaptation is the process of adapting, in an unsupervised manner, a pre-trained source model to each incoming batch of the test data (i. e., without requiring a substantial portion of the test data to be available, as in traditional domain adaptation) and without access to the source data.
no code implementations • 13 Dec 2023 • Karthik Elamvazhuthi, Samet Oymak, Fabio Pasqualetti
We use a control theoretic perspective by posing the approximation of the reverse process as a trajectory tracking problem.
no code implementations • 8 Nov 2023 • Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury
By restoring the knowledge from the source, it effectively corrects the negative consequences arising from the gradual deterioration of model parameters caused by ongoing shifts in the domain.
1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak
In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.
1 code implementation • 16 Aug 2023 • Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay
Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics.
no code implementations • 10 Jul 2023 • Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
1 code implementation • NeurIPS 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak
Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.
no code implementations • 6 Jun 2023 • Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis
Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.
1 code implementation • 2 Jun 2023 • Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak
Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods).
no code implementations • 15 May 2023 • Karthik Elamvazhuthi, Xuechen Zhang, Samet Oymak, Fabio Pasqualetti
To address this shortcoming, in this paper we study a class of neural ordinary differential equations that, by design, leave a given manifold invariant, and characterize their properties by leveraging the controllability properties of control affine systems.
no code implementations • 8 Mar 2023 • Yingcong Li, Samet Oymak
A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.
no code implementations • 2 Feb 2023 • Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak
The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.
2 code implementations • 17 Jan 2023 • Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak
We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.
no code implementations • 29 Aug 2022 • Yahya Sattar, Samet Oymak, Necmiye Ozay
This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs.
no code implementations • 12 May 2022 • Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti
Humans are capable of adjusting to changing environments flexibly and quickly.
3 code implementations • 4 May 2022 • Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak
Standard federated optimization methods successfully apply to stochastic problems with single-level structure.
1 code implementation • 30 Mar 2022 • Yue Sun, Samet Oymak, Maryam Fazel
Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system.
1 code implementation • 3 Mar 2022 • Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.
1 code implementation • NeurIPS 2021 • Yue Sun, Adhyyan Narang, Halil Ibrahim Gulluk, Samet Oymak, Maryam Fazel
Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias.
no code implementations • 13 Jan 2022 • Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti
In this paper, we study representation learning for multi-task decision-making in non-stationary environments.
1 code implementation • NeurIPS 2021 • Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak
Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split.
no code implementations • 13 Nov 2021 • Yahya Sattar, Zhe Du, Davoud Ataee Tarzanagh, Laura Balzano, Necmiye Ozay, Samet Oymak
Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system.
no code implementations • 6 Oct 2021 • Xuechen Zhang, Samet Oymak, Jiasi Chen
Estimating how well a machine learning model performs during inference is critical in a variety of scenarios (for example, to quantify uncertainty, or to choose from a library of available models).
no code implementations • 26 May 2021 • Zhe Du, Yahya Sattar, Davoud Ataee Tarzanagh, Laura Balzano, Samet Oymak, Necmiye Ozay
Real-world control applications often involve complex dynamics subject to abrupt changes or variations.
no code implementations • 29 Apr 2021 • Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi
In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem).
1 code implementation • CVPR 2021 • Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury
A recent line of work addressed this problem and proposed an algorithm that transfers knowledge to the unlabeled target domain from a single source model without requiring access to the source data.
1 code implementation • NeurIPS 2021 • Ganesh Ramachandra Kini, Orestis Paraskevas, Samet Oymak, Christos Thrampoulidis
The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity.
no code implementations • 22 Feb 2021 • Samet Oymak
Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up.
no code implementations • 14 Feb 2021 • Halil Ibrahim Gulluk, Yue Sun, Samet Oymak, Maryam Fazel
We prove that subspace-based representations can be learned in a sample-efficient manner and provably benefit future tasks in terms of sample complexity.
no code implementations • 16 Dec 2020 • Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis
Deep networks are typically trained with many more parameters than the size of the training dataset.
no code implementations • 16 Nov 2020 • Yao-Chun Chan, Mingchen Li, Samet Oymak
In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks.
no code implementations • NeurIPS 2020 • Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi
Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.
no code implementations • 5 Jul 2020 • A. B. Siddique, Samet Oymak, Vagelis Hristidis
Our evaluation also shows that PUP achieves a great trade-off between semantic similarity and diversity of expression.
no code implementations • 19 Jun 2020 • Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak
Model pruning is an essential procedure for building compact and computationally-efficient machine learning models.
no code implementations • 19 Jun 2020 • Samet Oymak, Talha Cihad Gulcu
We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data and weak supervision.
no code implementations • L4DC 2020 • Yue Sun, Samet Oymak, Maryam Fazel
This paper studies low-order linear system identification via regularized regression.
no code implementations • 23 Feb 2020 • Yuan Zhao, Jiasi Chen, Samet Oymak
We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms.
no code implementations • 20 Feb 2020 • Yahya Sattar, Samet Oymak
If the system is run by a stabilizing input policy, we show that temporally-dependent samples can be approximated by i. i. d.
no code implementations • 25 Sep 2019 • Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi
We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.
no code implementations • 3 Jul 2019 • Yahya Sattar, Samet Oymak
We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime.
no code implementations • 12 Jun 2019 • Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi
We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.
1 code implementation • 27 Mar 2019 • Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak
In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels.
no code implementations • 12 Feb 2019 • Samet Oymak, Mahdi Soltanolkotabi
However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor.
no code implementations • 25 Dec 2018 • Samet Oymak, Mahdi Soltanolkotabi
In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optima.
no code implementations • ICLR 2019 • Samet Oymak
We study discrete time dynamical systems governed by the state equation $h_{t+1}=\phi(Ah_t+Bu_t)$.
1 code implementation • 14 Jun 2018 • Samet Oymak, Necmiye Ozay
We consider the problem of learning a realization for a linear time-invariant (LTI) dynamical system from input/output data.
no code implementations • 11 Jun 2018 • Amir Asiaee, Samet Oymak, Kevin R. Coombes, Arindam Banerjee
We consider the problem of multi-task learning in the high dimensional setting.
no code implementations • 16 May 2018 • Samet Oymak, Mahdi Soltanolkotabi
In this paper we study the problem of learning the weights of a deep convolutional neural network.
no code implementations • ICML 2018 • Samet Oymak
Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient.
no code implementations • 20 May 2017 • Samet Oymak, Mehrdad Mahdavi, Jiasi Chen
Evaluations on synthetic and real datasets demonstrate that algorithm is competitive with current state-of-the-art and accurately learns feature nonlinearities.
no code implementations • 23 Oct 2016 • Samet Oymak, Mahdi Soltanolkotabi
In this paper we study the problem of recovering a structured but unknown parameter ${\bf{\theta}}^*$ from $n$ nonlinear observations of the form $y_i=f(\langle {\bf{x}}_i,{\bf{\theta}}^*\rangle)$ for $i=1, 2,\ldots, n$.
no code implementations • 14 Dec 2015 • Samet Oymak, Ben Recht
We characterize the tradeoff between distortion and sample complexity $m$ in terms of the Gaussian width $\omega(K)$ of the set.
no code implementations • 30 Nov 2015 • Samet Oymak, Joel A. Tropp
In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data.
no code implementations • NeurIPS 2015 • Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan
We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.
no code implementations • 16 Jul 2015 • Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi
We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function.
no code implementations • 11 Jun 2015 • Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi
In this paper we show that for the purposes of dimensionality reduction certain class of structured random matrices behave similarly to random Gaussian matrices.
no code implementations • NeurIPS 2014 • Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi
We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed.
no code implementations • 4 Nov 2013 • Samet Oymak, Christos Thrampoulidis, Babak Hassibi
The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$.