Search Results for author: Robert D Mullins

Found 3 papers, 0 papers with code

Revisiting Structured Dropout

no code implementations • 5 Oct 2022 • Yiren Zhao, Oluwatomisin Dada, Xitong Gao, Robert D Mullins

Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization.

Scheduling

Paper
Add Code

DARTFormer: Finding The Best Type Of Attention

no code implementations • 2 Oct 2022 • Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

Given the wide and ever growing range of different efficient Transformer attention mechanisms, it is important to identify which attention is most effective when given a task.

ListOps Neural Architecture Search +3

Paper
Add Code

Wide Attention Is The Way Forward For Transformers?

no code implementations • 2 Oct 2022 • Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch.

text-classification Text Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.