1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.