Search Results for author: Miles Williams

Found 3 papers, 2 papers with code

How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?

no code implementations • 16 Nov 2023 • Miles Williams, Nikolaos Aletras

Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs).

Model Compression Quantization

Paper
Add Code

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

1 code implementation • 15 Nov 2023 • George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.

Abstractive Text Summarization Hallucination +1

Paper
Code

Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

1 code implementation • 15 Sep 2023 • Miles Williams, Nikolaos Aletras

The extensive memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings, such as cloud environments or on-device.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.