1 code implementation • 3 May 2023 • Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu
In this work, we explore a "pre-train, and search" paradigm for efficient sharding.