# How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes

Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text, also known as in-context learning (ICL).

1

# Uncovering hidden geometry in Transformers via disentangling position and context

1 code implementation7 Oct 2023,

Given embedding vector $\boldsymbol{h}_{c, t} \in \mathbb{R}^d$ at sequence position $t \le T$ in a sequence (or context) $c \le C$, extracting the mean effects yields the decomposition $\boldsymbol{h}_{c, t} = \boldsymbol{\mu} + \mathbf{pos}_t + \mathbf{ctx}_c + \mathbf{resid}_{c, t}$ where $\boldsymbol{\mu}$ is the global mean vector, $\mathbf{pos}_t$ and $\mathbf{ctx}_c$ are the mean vectors across contexts and across positions respectively, and $\mathbf{resid}_{c, t}$ is the residual vector.

4

# Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage

no code implementations6 Jun 2023, ,

Firstly, through empirical and theoretical analysis, we identify two crucial effects -- expansion and shrinkage -- induced by the contrastive loss on the projectors.

# Tractability from overparametrization: The example of the negative perceptron

In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i, y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label.

# The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

no code implementations25 Jul 2020,

We assume that both the sample size $n$ and the dimension $d$ are large, and they are polynomially related.

# A Selective Overview of Deep Learning

no code implementations10 Apr 2019, ,

Deep learning has arguably achieved tremendous success in recent years.

# Robust high dimensional factor models with applications to statistical machine learning

Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance.

# Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

We develop an $\epsilon$-differentially private mechanism for the class of $K$-smooth queries.

Cannot find the paper you are looking for? You can Submit a new open access paper.