no code implementations • 1 Dec 2023 • Fabio Fehr, James Henderson
We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation.
2 code implementations • 26 Oct 2023 • Melika Behjati, Fabio Fehr, James Henderson
Finally, we show that NVIB compression results in a model which is more robust to adversarial perturbations.
no code implementations • 27 Jul 2022 • James Henderson, Fabio Fehr
We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings.
3 code implementations • 7 Mar 2022 • Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson
We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding.