Search Results for author: Paolo Glorioso

Found 3 papers, 2 papers with code

The Unreasonable Ineffectiveness of the Deeper Layers

1 code implementation • 26 Mar 2024 • Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

97

Paper
Code

BlackMamba: Mixture of Experts for State-Space Models

1 code implementation • 1 Feb 2024 • Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.

Language Modelling

196

Paper
Code

Flatter, faster: scaling momentum for optimal speedup of SGD

no code implementations • 28 Oct 2022 • Aditya Cowsik, Tankut Can, Paolo Glorioso

Commonly used optimization algorithms often show a trade-off between good generalization and fast training times.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.