1 code implementation • 14 Mar 2023 • Kaiqi Zhao, Animesh Jain, Ming Zhao
Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks.
no code implementations • 22 Jan 2022 • Kaiqi Zhao, Animesh Jain, Ming Zhao
To solve this problem, we propose two activation-based pruning methods, Iterative Activation-based Pruning (IAP) and Adaptive Iterative Activation-based Pruning (AIAP).
1 code implementation • 21 Jan 2022 • Kaiqi Zhao, Animesh Jain, Ming Zhao
Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices.
no code implementations • 27 Mar 2021 • Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.
no code implementations • 21 Jan 2021 • Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, Tony Nowatzki
However, it is hard to leverage mixed precision without hardware support because of the overhead of data casting.
no code implementations • 18 Jun 2020 • Animesh Jain, Shoubhik Bhattacharya, Masahiro Masuda, Vin Sharma, Yida Wang
A deep learning compiler such as Apache TVM can enable the efficient execution of model from various frameworks on various targets.
1 code implementation • 27 Feb 2020 • Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai, Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang
Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost.