Search Results for author: Yingcong Li

Found 10 papers, 4 papers with code

Mechanics of Next Token Prediction with Self-Attention

no code implementations12 Mar 2024 Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.

Retrieval

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

no code implementations21 Feb 2024 M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.

Text Generation

Transformers as Support Vector Machines

1 code implementation31 Aug 2023 Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.

Max-Margin Token Selection in Attention Mechanism

1 code implementation NeurIPS 2023 Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.

Provable Pathways: Learning Multiple Tasks over Multiple Paths

no code implementations8 Mar 2023 Yingcong Li, Samet Oymak

A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.

Generalization Bounds

Stochastic Contextual Bandits with Long Horizon Rewards

no code implementations2 Feb 2023 Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak

The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.

Decision Making Language Modelling +1

Transformers as Algorithms: Generalization and Stability in In-context Learning

2 code implementations17 Jan 2023 Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.

Generalization Bounds In-Context Learning +3

Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis

no code implementations19 May 2022 Yingcong Li, Chandra Sekhar Mukherjee, Jiapeng Zhang

We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area.

Clustering

Provable and Efficient Continual Representation Learning

1 code implementation3 Mar 2022 Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.

Continual Learning Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.