no code implementations • 9 Dec 2024 • Yifang Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework.
no code implementations • 8 Dec 2024 • Yekun Ke, YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
The application of transformer-based models on time series forecasting (TSF) tasks has long been popular to study.
no code implementations • 7 Dec 2024 • Xiaoyu Li, Yuanpeng Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms.
no code implementations • 12 Nov 2024 • Bo Chen, Xiaoyu Li, YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song
In this work, we establish a circuit complexity bound for Transformers with $\mathsf{RoPE}$ attention.
no code implementations • 15 Oct 2024 • Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers.
no code implementations • 15 Oct 2024 • YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou
Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants.
no code implementations • 15 Oct 2024 • Bo Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Our results demonstrate that as long as the input data has a constant condition number, e. g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning.
no code implementations • 14 Oct 2024 • Bo Chen, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Our approach achieves a running time of $O(mn^{4/5})$ significantly faster than the naive approach $O(mn)$ for attention generation, where $n$ is the context length, $m$ is the query length, and $d$ is the hidden dimension.
no code implementations • 12 Oct 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons.
no code implementations • 12 Oct 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds.
1 code implementation • 25 Sep 2024 • Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, YIngyu Liang, Shafiq Joty
Our research introduces a novel approach for the long context bottleneck to accelerate LLM inference and reduce GPU memory consumption.
no code implementations • 23 Aug 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs.
no code implementations • 22 Aug 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1, 1, a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplication.
no code implementations • 12 Aug 2024 • Jiuxiang Gu, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu
Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics.
1 code implementation • 22 Jul 2024 • Zhuoyan Xu, Zhenmei Shi, YIngyu Liang
In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples.
no code implementations • 20 Jul 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer.
no code implementations • 18 Jul 2024 • Jiuxiang Gu, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues.
1 code implementation • 21 Jun 2024 • Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi
Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains.
1 code implementation • 20 Jun 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.
no code implementations • 30 May 2024 • Zhenmei Shi, Junyi Wei, Zhuoyan Xu, YIngyu Liang
This sheds light on where transformers pay attention to and how that affects ICL.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention.
no code implementations • 8 May 2024 • YIngyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze Yin
We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices.
no code implementations • 6 May 2024 • Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song
The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture.
1 code implementation • 22 Feb 2024 • Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, YIngyu Liang
An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples.
no code implementations • 12 Feb 2024 • Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou
We direct our focus to the complex algebraic learning task of modular addition involving $k$ inputs.
1 code implementation • NeurIPS 2023 • Yiyou Sun, Zhenmei Shi, Yixuan Li
Open-world semi-supervised learning aims at inferring both known and novel classes in unlabeled data, by harnessing prior knowledge from a labeled set with known classes.
1 code implementation • 9 Aug 2023 • Yiyou Sun, Zhenmei Shi, YIngyu Liang, Yixuan Li
This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes.
1 code implementation • 13 Mar 2023 • Zhenmei Shi, Yifei Ming, Ying Fan, Frederic Sala, YIngyu Liang
In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization.
1 code implementation • 28 Feb 2023 • Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, YIngyu Liang, Somesh Jha
foundation models) has recently become a prevalent learning paradigm, where one first pre-trains a representation using large-scale unlabeled data, and then learns simple predictors on top of the representation using small labeled data from the downstream tasks.
no code implementations • ICLR 2022 • Zhenmei Shi, Junyi Wei, YIngyu Liang
These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance.
1 code implementation • 6 Oct 2021 • Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, YIngyu Liang
Our experiments demonstrate the strong performance of AWARE in graph-level prediction tasks in the standard setting in the domains of molecular property prediction and social networks.
1 code implementation • 2 Feb 2021 • Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, YIngyu Liang
We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning.
no code implementations • 4 Aug 2019 • Zhaoyang Yang, Zhenmei Shi, Xiaoyong Shen, Yu-Wing Tai
The proposed SF-Net extracts features in a structured manner and gradually encodes information at the frame level, the gloss level and the sentence level into the feature representation.
no code implementations • 2 Aug 2019 • Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang
Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.