Search Results for author: Yufa Zhou

Found 8 papers, 0 papers with code

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

no code implementations15 Oct 2024 YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants.

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

no code implementations12 Oct 2024 Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds.

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

no code implementations12 Oct 2024 YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons.

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

no code implementations23 Aug 2024 YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs.

Differential Privacy of Cross-Attention with Provable Guarantee

no code implementations20 Jul 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer.

RAG

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

no code implementations26 May 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians.

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

no code implementations26 May 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention.

Differentially Private Attention Computation

no code implementations8 May 2023 Yeqi Gao, Zhao Song, Xin Yang, Yufa Zhou

Large language models (LLMs), especially those based on the Transformer architecture, have had a profound impact on various aspects of daily life, such as natural language processing, content generation, research methodologies, and more.

Cannot find the paper you are looking for? You can Submit a new open access paper.