Search Results for author: Weigao Sun

Found 10 papers, 7 papers with code

Scaling Laws for Linear Complexity Language Models

1 code implementation24 Jun 2024 Xuyang Shen, Dong Li, Ruitao Leng, Zhen Qin, Weigao Sun, Yiran Zhong

In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability.

Information Retrieval Retrieval

HGRN2: Gated Linear RNNs with State Expansion

2 code implementations11 Apr 2024 Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference.

Image Classification Language Modelling

Linear Attention Sequence Parallelism

1 code implementation3 Apr 2024 Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP method tailored to linear attention-based language models.

MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes

no code implementations1 Mar 2024 Xiaqiang Tang, Weigao Sun, Siyuan Hu, Yiyang Sun, Yafeng Guo

In the training stage, the motion prediction task under differentiated scenes is abstracted as a multi-task learning problem, an evolutionary algorithm is designed to encourage the network search of the optimal parameters for each scene while sharing common knowledge between different scenes.

Autonomous Driving motion prediction +1

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation29 Jan 2024 Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

1 code implementation9 Jan 2024 Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations27 Jul 2023 Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

no code implementations25 Sep 2019 Jun Liu, Beitong Zhou, Weigao Sun, Ruijuan Chen, Claire J. Tomlin, Ye Yuan

In this paper, we propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term \emph{PowerSGD}.

Cannot find the paper you are looking for? You can Submit a new open access paper.