no code implementations • 18 Sep 2022 • Qing Jin, Zhiyu Chen, Jian Ren, Yanyu Li, Yanzhi Wang, Kaiyuan Yang
In this paper, we propose a method for training quantized networks to incorporate PIM quantization, which is ubiquitous to all PIM systems.
1 code implementation • ICLR 2022 • Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov
Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.
1 code implementation • NeurIPS 2021 • Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin
Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.
no code implementations • 6 Jul 2021 • Zhiyu Chen, Zhanghao Yu, Qing Jin, Yan He, Jingyu Wang, Sheng Lin, Dai Li, Yanzhi Wang, Kaiyuan Yang
A compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro, named CAP-RAM, is presented for energy-efficient convolutional neural network (CNN) inference.
1 code implementation • CVPR 2021 • Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov
In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation.
no code implementations • 19 Feb 2021 • Ning Liu, Geng Yuan, Zhengping Che, Xuan Shen, Xiaolong Ma, Qing Jin, Jian Ren, Jian Tang, Sijia Liu, Yanzhi Wang
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i. e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network.
no code implementations • CVPR 2021 • Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin
With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed.
1 code implementation • 4 Jul 2020 • Linjie Yang, Qing Jin
Model quantization helps to reduce model size and latency of deep neural networks.
3 code implementations • 21 Dec 2019 • Qing Jin, Linjie Yang, Zhenyu Liao
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.
1 code implementation • CVPR 2020 • Qing Jin, Linjie Yang, Zhenyu Liao
With our proposed techniques applied on a bunch of models including MobileNet-V1/V2 and ResNet-50, we demonstrate that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.
no code implementations • 25 Sep 2019 • Qing Jin, Linjie Yang, Zhenyu Liao
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.