Search Results for author: Zhenyu Gu

Found 5 papers, 1 papers with code

Boosting Deep Neural Network Efficiency with Dual-Module Inference

no code implementations ICML 2020 Liu Liu, Lei Deng, Zhaodong Chen, yuke wang, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie

Using Deep Neural Networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs.

Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention

no code implementations18 Oct 2021 Zhe Zhou, Junlin Liu, Zhenyu Gu, Guangyu Sun

To enable such an algorithm with lower latency and better energy efficiency, we also propose an Energon co-processor architecture.

Edge-computing

Distribution Adaptive INT8 Quantization for Training CNNs

no code implementations9 Feb 2021 Kang Zhao, Sida Huang, Pan Pan, Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Researches have demonstrated that low bit-width (e. g., INT8) quantization can be employed to accelerate the inference process.

Image Classification object-detection +3

Dual-module Inference for Efficient Recurrent Neural Networks

no code implementations25 Sep 2019 Liu Liu, Lei Deng, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie

Using Recurrent Neural Networks (RNNs) in sequence modeling tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements because of the memory-bound execution pattern of RNNs.

Cannot find the paper you are looking for? You can Submit a new open access paper.