Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times... (read more)
PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract