Search Results for author: Robert D Mullins

Found 3 papers, 0 papers with code

Revisiting Structured Dropout

no code implementations5 Oct 2022 Yiren Zhao, Oluwatomisin Dada, Xitong Gao, Robert D Mullins

Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization.

Scheduling

DARTFormer: Finding The Best Type Of Attention

no code implementations2 Oct 2022 Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

Given the wide and ever growing range of different efficient Transformer attention mechanisms, it is important to identify which attention is most effective when given a task.

ListOps Neural Architecture Search +3

Wide Attention Is The Way Forward For Transformers?

no code implementations2 Oct 2022 Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.