5 code implementations • 27 Dec 2021 • Seung Hoon Lee, SeungHyun Lee, Byung Cheol Song
However, the high performance of the ViT results from pre-training using a large-size dataset such as JFT-300M, and its dependence on a large dataset is interpreted as due to low locality inductive bias.