no code implementations • 5 Oct 2022 • Yiren Zhao, Oluwatomisin Dada, Xitong Gao, Robert D Mullins
Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization.
no code implementations • 2 Oct 2022 • Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins
Given the wide and ever growing range of different efficient Transformer attention mechanisms, it is important to identify which attention is most effective when given a task.
no code implementations • 2 Oct 2022 • Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins
We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch.