Search Results for author: Yuntao Gui

Found 2 papers, 1 papers with code

SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification

1 code implementation16 Dec 2023 Yuntao Gui, Xiao Yan, Peiqi Yin, Han Yang, James Cheng

Thus, we design the sparse MHA module, which computes and stores only large attention weights to reduce memory consumption, and the routed FFN module, which dynamically activates a subset of model parameters for each token to reduce computation cost.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.