1 code implementation • 14 Jul 2022 • Ji Liu, daxiang dong, Xi Wang, An Qin, Xingjian Li, Patrick Valduriez, Dejing Dou, dianhai yu
Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time.