1 code implementation • 7 Oct 2023 • Michael Matena, Colin Raffel
Using unique properties of NPEFF's parameter-space representations, we ran extensive experiments to verify that the connections between directions in parameters space and examples recovered by NPEFF actually reflect the model's processing.
no code implementations • 1 Oct 2022 • Michael Matena, Colin Raffel
We explore the implications of this combinatorial aspect of ReLU optimization in this work.
2 code implementations • 18 Nov 2021 • Michael Matena, Colin Raffel
Computing a simple average of the models' parameters therefore corresponds to making an isotropic Gaussian approximation to their posteriors.
1 code implementation • EMNLP 2021 • Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.
53 code implementations • arXiv 2019 • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).
Ranked #1 on Sentiment Analysis on SST-2 Binary classification