Search Results for author: Elan van Biljon

Found 8 papers, 5 papers with code

On Optimal Transformer Depth for Low-Resource Language Translation

1 code implementation9 Apr 2020 Elan van Biljon, Arnu Pretorius, Julia Kreutzer

Therefore, by showing that transformer models perform well (and often best) at low-to-moderate depth, we hope to convince fellow researchers to devote less computational resources, as well as time, to exploring overly large models during the development of these systems.

Machine Translation NMT +1

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks

no code implementations13 Oct 2019 Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon

Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.

Critical initialisation for deep signal propagation in noisy rectifier neural networks

1 code implementation NeurIPS 2018 Arnu Pretorius, Elan van Biljon, Steve Kroon, Herman Kamper

Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.

Cannot find the paper you are looking for? You can Submit a new open access paper.