Search Results for author: Yingcong Li

Found 10 papers, 4 papers with code

Mechanics of Next Token Prediction with Self-Attention

no code implementations • 12 Mar 2024 • Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token.

Retrieval

Paper
Add Code

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

no code implementations • 21 Feb 2024 • M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation.

Text Generation

Paper
Add Code

Transformers as Support Vector Machines

1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.

Paper
Code

Max-Margin Token Selection in Attention Mechanism

1 code implementation • NeurIPS 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.

Paper
Code

Provable Pathways: Learning Multiple Tasks over Multiple Paths

no code implementations • 8 Mar 2023 • Yingcong Li, Samet Oymak

A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers.

Generalization Bounds

Paper
Add Code

Stochastic Contextual Bandits with Long Horizon Rewards

no code implementations • 2 Feb 2023 • Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak

The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons.

Decision Making Language Modelling +1

Paper
Add Code

Transformers as Algorithms: Generalization and Stability in In-context Learning

2 code implementations • 17 Jan 2023 • Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i. i. d.

Generalization Bounds In-Context Learning +3

Paper
Code

Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis

no code implementations • 19 May 2022 • Yingcong Li, Chandra Sekhar Mukherjee, Jiapeng Zhang

We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area.

Clustering

Paper
Add Code

Provable and Efficient Continual Representation Learning

1 code implementation • 3 Mar 2022 • Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.

Continual Learning Representation Learning

Paper
Code

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

no code implementations • 16 Dec 2020 • Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Deep networks are typically trained with many more parameters than the size of the training dataset.

Model Compression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.