no code implementations • 16 Nov 2023 • Miles Williams, Nikolaos Aletras
Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs).
1 code implementation • 15 Nov 2023 • George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras
Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.
1 code implementation • 15 Sep 2023 • Miles Williams, Nikolaos Aletras
The extensive memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings, such as cloud environments or on-device.