no code implementations • 30 Oct 2023 • Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu
However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques.
no code implementations • 16 Feb 2023 • Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin
We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment.
no code implementations • 1 Feb 2023 • Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Xianyan Jia, Yong Li, Chencan Wu, Jialin Li, Wei Lin
However, finding a suitable model parallel schedule for an arbitrary neural network is a non-trivial task due to the exploding search space.
1 code implementation • 25 Jul 2021 • Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong liu, Yang You
To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or model compressing along with the depth.
Ranked #677 on Image Classification on ImageNet