Search Results for author: Miles Williams

Found 3 papers, 2 papers with code

How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?

no code implementations16 Nov 2023 Miles Williams, Nikolaos Aletras

Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs).

Model Compression Quantization

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

1 code implementation15 Nov 2023 George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.

Abstractive Text Summarization Hallucination +1

Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

1 code implementation15 Sep 2023 Miles Williams, Nikolaos Aletras

The extensive memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings, such as cloud environments or on-device.

Cannot find the paper you are looking for? You can Submit a new open access paper.