Search Results for author: Paolo Glorioso

Found 3 papers, 2 papers with code

The Unreasonable Ineffectiveness of the Deeper Layers

1 code implementation26 Mar 2024 Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

BlackMamba: Mixture of Experts for State-Space Models

1 code implementation1 Feb 2024 Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.

Language Modelling

Flatter, faster: scaling momentum for optimal speedup of SGD

no code implementations28 Oct 2022 Aditya Cowsik, Tankut Can, Paolo Glorioso

Commonly used optimization algorithms often show a trade-off between good generalization and fast training times.

Cannot find the paper you are looking for? You can Submit a new open access paper.