Search Results for author: Jiaming Tang

Found 1 papers, 1 papers with code

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

2 code implementations1 Jun 2023 Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Chuang Gan, Song Han

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).

Common Sense Reasoning Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.