Search Results for author: Zhenmei Shi

Found 35 papers, 11 papers with code

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

no code implementations9 Dec 2024 Yifang Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework.

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

no code implementations8 Dec 2024 Yekun Ke, YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

The application of transformer-based models on time series forecasting (TSF) tasks has long been popular to study.

On the Expressive Power of Modern Hopfield Networks

no code implementations7 Dec 2024 Xiaoyu Li, Yuanpeng Li, YIngyu Liang, Zhenmei Shi, Zhao Song

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms.

Circuit Complexity Bounds for RoPE-based Transformer Architecture

no code implementations12 Nov 2024 Bo Chen, Xiaoyu Li, YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song

In this work, we establish a circuit complexity bound for Transformers with $\mathsf{RoPE}$ attention.

Position

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

no code implementations15 Oct 2024 Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song

Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers.

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

no code implementations15 Oct 2024 YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants.

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

no code implementations15 Oct 2024 Bo Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song

Our results demonstrate that as long as the input data has a constant condition number, e. g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning.

In-Context Learning

HSR-Enhanced Sparse Attention Acceleration

no code implementations14 Oct 2024 Bo Chen, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

Our approach achieves a running time of $O(mn^{4/5})$ significantly faster than the naive approach $O(mn)$ for attention generation, where $n$ is the context length, $m$ is the query length, and $d$ is the hidden dimension.

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

no code implementations12 Oct 2024 YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons.

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

no code implementations12 Oct 2024 Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds.

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

1 code implementation25 Sep 2024 Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, YIngyu Liang, Shafiq Joty

Our research introduces a novel approach for the long context bottleneck to accelerate LLM inference and reduce GPU memory consumption.

Token Reduction

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

no code implementations23 Aug 2024 YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs.

A Tighter Complexity Analysis of SparseGPT

no code implementations22 Aug 2024 Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song

In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1, 1, a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplication.

Fast John Ellipsoid Computation with Differential Privacy Optimization

no code implementations12 Aug 2024 Jiuxiang Gu, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu

Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics.

Privacy Preserving

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

1 code implementation22 Jul 2024 Zhuoyan Xu, Zhenmei Shi, YIngyu Liang

In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples.

In-Context Learning

Differential Privacy of Cross-Attention with Provable Guarantee

no code implementations20 Jul 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer.

RAG

Differential Privacy Mechanisms in Neural Tangent Kernel Regression

no code implementations18 Jul 2024 Jiuxiang Gu, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues.

Face Recognition Image Classification +3

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

1 code implementation21 Jun 2024 Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi

Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains.

Spatial Reasoning

Towards Infinite-Long Prefix in Transformer

1 code implementation20 Jun 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.

Math parameter-efficient fine-tuning

Why Larger Language Models Do In-context Learning Differently?

no code implementations30 May 2024 Zhenmei Shi, Junyi Wei, Zhuoyan Xu, YIngyu Liang

This sheds light on where transformers pay attention to and how that affects ICL.

In-Context Learning

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

no code implementations26 May 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians.

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

no code implementations26 May 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention.

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

no code implementations8 May 2024 YIngyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze Yin

We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices.

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

no code implementations6 May 2024 Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song

The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture.

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

1 code implementation22 Feb 2024 Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, YIngyu Liang

An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples.

A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning

1 code implementation NeurIPS 2023 Yiyou Sun, Zhenmei Shi, Yixuan Li

Open-world semi-supervised learning aims at inferring both known and novel classes in unlabeled data, by harnessing prior knowledge from a labeled set with known classes.

Clustering Open-World Semi-Supervised Learning +1

When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis

1 code implementation9 Aug 2023 Yiyou Sun, Zhenmei Shi, YIngyu Liang, Yixuan Li

This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes.

Novel Class Discovery

Domain Generalization via Nuclear Norm Regularization

1 code implementation13 Mar 2023 Zhenmei Shi, Yifei Ming, Ying Fan, Frederic Sala, YIngyu Liang

In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization.

Domain Generalization

The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

1 code implementation28 Feb 2023 Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, YIngyu Liang, Somesh Jha

foundation models) has recently become a prevalent learning paradigm, where one first pre-trains a representation using large-scale unlabeled data, and then learns simple predictors on top of the representation using small labeled data from the downstream tasks.

Contrastive Learning

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

no code implementations ICLR 2022 Zhenmei Shi, Junyi Wei, YIngyu Liang

These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance.

Attentive Walk-Aggregating Graph Neural Networks

1 code implementation6 Oct 2021 Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, YIngyu Liang

Our experiments demonstrate the strong performance of AWARE in graph-level prediction tasks in the standard setting in the domains of molecular property prediction and social networks.

Molecular Property Prediction Property Prediction

Deep Online Fused Video Stabilization

1 code implementation2 Feb 2021 Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, YIngyu Liang

We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning.

Video Stabilization

SF-Net: Structured Feature Network for Continuous Sign Language Recognition

no code implementations4 Aug 2019 Zhaoyang Yang, Zhenmei Shi, Xiaoyong Shen, Yu-Wing Tai

The proposed SF-Net extracts features in a structured manner and gradually encodes information at the frame level, the gloss level and the sentence level into the feature representation.

Sentence Sign Language Recognition

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking

no code implementations2 Aug 2019 Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang

Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.

Video Object Tracking Visual Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.