1 code implementation • 28 Mar 2024 • Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs.
no code implementations • 24 Jan 2024 • Dong Zhang, Pingcheng Dong, Xinting Hu, Long Chen, Kwang-Ting Cheng
Concurrently, the relation distillation transfers implicit relations from the teacher model to the student model using pixel-level self-relation as a bridge, ensuring that the student's mask has strong target region connectivity.
1 code implementation • 25 Oct 2023 • Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng
Our method, for the first time, can quantize both weights and activations in the LLaMA-13B to only 4-bit and achieves an average score of 63. 1 on the common sense zero-shot reasoning tasks, which is only 5. 8 lower than the full-precision model, significantly outperforming the previous state-of-the-art by 12. 7 points.