no code implementations • 12 Mar 2024 • Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak
}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.
no code implementations • 21 Feb 2024 • M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak
Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.
1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak
In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.
1 code implementation • NeurIPS 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak
Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.
no code implementations • 8 Mar 2023 • Yingcong Li, Samet Oymak
A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.
no code implementations • 2 Feb 2023 • Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak
The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.
2 code implementations • 17 Jan 2023 • Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak
We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.
no code implementations • 19 May 2022 • Yingcong Li, Chandra Sekhar Mukherjee, Jiapeng Zhang
We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area.
1 code implementation • 3 Mar 2022 • Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.
no code implementations • 16 Dec 2020 • Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis
Deep networks are typically trained with many more parameters than the size of the training dataset.