Search Results for author: Hofit Bata

Found 3 papers, 1 papers with code

The Depth-to-Width Interplay in Self-Attention

1 code implementation NeurIPS 2020 Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua

Our guidelines elucidate the depth-to-width trade-off in self-attention networks of sizes up to the scale of GPT3 (which we project to be too deep for its size), and beyond, marking an unprecedented width of 30K as optimal for a 1-Trillion parameter network.

Cannot find the paper you are looking for? You can Submit a new open access paper.