no code implementations • 23 May 2022 • Peng Hu, Xi Peng, Hongyuan Zhu, Mohamed M. Sabry Aly, Jie Lin
Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e. g., pruning sparsity and quantization codebook) of each layer.
no code implementations • 29 Sep 2021 • Zhe Wang, Jie Lin, Xue Geng, Mohamed M. Sabry Aly, Vijay Chandrasekhar
We formulate the quantization of deep neural networks as a rate-distortion optimization problem, and present an ultra-fast algorithm to search the bit allocation of channels.
no code implementations • CVPR 2021 • Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly
Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines.
no code implementations • 25 Sep 2019 • Zhe Wang, Jie Lin, Mohamed M. Sabry Aly, Sean I Young, Vijay Chandrasekhar, Bernd Girod
In this paper, we address an important problem of how to optimize the bit allocation of weights and activations for deep CNNs compression.
no code implementations • 4 Jan 2019 • Xue Geng, Jie Fu, Bin Zhao, Jie Lin, Mohamed M. Sabry Aly, Christopher Pal, Vijay Chandrasekhar
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage.