Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size.
Based on our evaluation, FlexSA with the proposed compilation heuristic improves compute resource utilization of pruning and training modern CNN models by 37% compared to a conventional training accelerator with a large systolic array.
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth.
However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user.
State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.