no code implementations • 19 Feb 2025 • Boyang Zhang, Daning Cheng, Yunquan Zhang, Meiqi Tu, Fangmin Liu, Jiake Tian
The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment.
no code implementations • 9 Dec 2024 • Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu, WenGuang Chen
A key challenge is effectively leveraging compression errors and defining the boundaries for lossless compression to minimize model loss.
no code implementations • 9 Dec 2024 • Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu, Jiake Tian
Low-rank factorization is a popular model compression technique that minimizes the error $\delta$ between approximated and original weight matrices.
no code implementations • 9 Dec 2024 • Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu
We introduce a deep model series expansion framework to address this issue, enabling rapid and accurate approximation of unquantized models without calibration sets or fine-tuning.