Search Results for author: Minhak Song

Found 1 papers, 1 papers with code

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation2 Oct 2023 Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

Cannot find the paper you are looking for? You can Submit a new open access paper.