Search Results for author: Shanda Li

Found 7 papers, 3 papers with code

Functional Interpolation for Relative Positions Improves Long Context Transformers

no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.

Language Modelling Position

Paper
Add Code

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

no code implementations • 3 Feb 2023 • Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller

We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs).

Image Classification Inductive Bias +1

Paper
Add Code

Is $L^2$ Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?

1 code implementation • 4 Jun 2022 • Chuwei Wang, Shanda Li, Di He, LiWei Wang

In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero.

Paper
Code

Your Transformer May Not be as Powerful as You Expect

1 code implementation • 26 May 2022 • Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He

Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.

Paper
Code

Learning Physics-Informed Neural Networks without Stacked Back-propagation

1 code implementation • 18 Feb 2022 • Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, LiWei Wang, Tie-Yan Liu

In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.

Paper
Code

Can Vision Transformers Perform Convolution?

no code implementations • 2 Nov 2021 • Shanda Li, Xiangning Chen, Di He, Cho-Jui Hsieh

Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers.

Paper
Add Code

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.