no code implementations • COLING 2022 • Keli Xie, Dongchen He, Jiaxin Zhuang, Siyuan Lu, Zhongfeng Wang
To better capture the dialogue information, we propose a 2D view of dialogue based on a time-speaker perspective, where the time and speaker streams of dialogue can be obtained as strengthened input.
no code implementations • 24 Nov 2024 • Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst
To address these challenges, we investigate the sensitivity of activation precision across various LLM modules and its impact on overall model accuracy.
no code implementations • 21 Nov 2024 • Xinyan Liu, Huihong Shi, Yang Xu, Zhongfeng Wang
Transformer-based diffusion models, dubbed Diffusion Transformers (DiTs), have achieved state-of-the-art performance in image and video generation tasks.
no code implementations • 10 Oct 2024 • Yanbiao Liang, Huihong Shi, Zhongfeng Wang
While prior work has explored quantization for efficient ViTs to marry the best of efficient hybrid ViT architectures and quantization, it focuses on uniform quantization and overlooks the potential advantages of mixed quantization.
no code implementations • 26 Sep 2024 • Shaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang
When integrated into LLMs, we achieve up to 6. 7\times inference acceleration.
no code implementations • 7 Sep 2024 • Yang Xu, Huihong Shi, Zhongfeng Wang
To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models.
no code implementations • 16 Jul 2024 • Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang
Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices.
1 code implementation • 30 May 2024 • Huihong Shi, Xin Cheng, Wendong Mao, Zhongfeng Wang
Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices.
1 code implementation • 6 May 2024 • Huihong Shi, Haikuo Shao, Wendong Mao, Zhongfeng Wang
Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks.
no code implementations • 29 Mar 2024 • Haikuo Shao, Huihong Shi, Wendong Mao, Zhongfeng Wang
Vision Transformers (ViTs) have achieved significant success in computer vision.
no code implementations • 22 Feb 2024 • Miaoxin Wang, Xiao Wu, Jun Lin, Zhongfeng Wang
Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169. 68 GOPS and 244. 55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time.
no code implementations • 22 Jan 2024 • Yuhao Ji, Chao Fang, Zhongfeng Wang
Existing binary Transformers are promising in edge deployment due to their compact model size, low computational complexity, and considerable inference accuracy.
no code implementations • 17 Dec 2023 • Siyu Zhang, Wendong Mao, Huihong Shi, Zhongfeng Wang
Video compression is widely used in digital television, surveillance systems, and virtual reality.
no code implementations • 22 Sep 2023 • Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang
At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy.
no code implementations • 15 Sep 2023 • Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang
However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy.
1 code implementation • 16 Aug 2023 • Minghao She, Wendong Mao, Huihong Shi, Zhongfeng Wang
In this paper, we propose a double-win framework for ideal and blind SR task, named S2R, including a light-weight transformer-based SR model (S2R transformer) and a novel coarse-to-fine training strategy, which can achieve excellent visual results on both ideal and random fuzzy conditions.
no code implementations • 16 Nov 2022 • Siyuan Lu, Chenchen Zhou, Keli Xie, Jun Lin, Zhongfeng Wang
Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size.
1 code implementation • 9 Nov 2022 • Jyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin
Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs.
no code implementations • 4 Nov 2022 • Mingyu Zhu, Jiapeng Luo, Wendong Mao, Zhongfeng Wang
In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA.
1 code implementation • 28 Oct 2022 • Jiayi Tian, Chao Fang, Haonan Wang, Zhongfeng Wang
Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks.
2 code implementations • 24 Oct 2022 • Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin
Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications.
no code implementations • 18 Oct 2022 • Ziqi Su, Wendong Mao, Zhongfeng Wang, Jun Lin, WenQiang Wang, Haitao Sun
3D deconvolution (DeConv), as an important computation of 3D-GAN, significantly increases computational complexity compared with 2D DeConv.
no code implementations • 14 Oct 2022 • Zilun Wang, Wendong Mao, Peixiang Yang, Zhongfeng Wang, Jun Lin
The submanifold sparse convolutional network (SSCN) has been widely used for the point cloud due to its unique advantages in terms of visual results.
no code implementations • 12 Aug 2022 • Chao Fang, Aojun Zhou, Zhongfeng Wang
(1) From algorithm perspective, we propose a sparsity inheritance mechanism along with an inherited dynamic pruning (IDP) method to obtain a series of N:M sparse candidate Transformers rapidly.
no code implementations • 1 Aug 2022 • Lang Feng, Wenjian Liu, Chuliang Guo, Ke Tang, Cheng Zhuo, Zhongfeng Wang
To improve the design quality while saving the cost, design automation for neural network accelerators was proposed, where design space exploration algorithms are used to automatically search the optimized accelerator design within a design space.
no code implementations • 27 Dec 2021 • Xian Wei, Yanhui Huang, Yangyu Xu, Mingsong Chen, Hai Lan, Yuanxiang Li, Zhongfeng Wang, Xuan Tang
Learning deep models with both lightweight and robustness is necessary for these equipments.
no code implementations • 12 Oct 2021 • Zhuang Shao, Xiaoliang Chen, Li Du, Lei Chen, Yuan Du, Wei Zhuang, Huadong Wei, Chenjia Xie, Zhongfeng Wang
To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps.
no code implementations • 1 Jul 2021 • Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang
Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed.
no code implementations • 24 Jun 2021 • Yubo Shi, Meiqi Wang, Siyi Chen, Jinghe Wei, Zhongfeng Wang
To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently.
no code implementations • 18 Sep 2020 • Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang
Designing hardware accelerators for deep neural networks (DNNs) has been much desired.
no code implementations • 6 Sep 2019 • Jinming Lu, Siyuan Lu, Zhisheng Wang, Chao Fang, Jun Lin, Zhongfeng Wang, Li Du
With the increasing size of Deep Neural Network (DNN) models, the high memory space requirements and computational complexity have become an obstacle for efficient DNN implementations.
no code implementations • 31 May 2019 • Haonan Wang, Jun Lin, Zhongfeng Wang
Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promising performance on video recognition tasks because of its powerful spatio-temporal information fusion ability.
no code implementations • 8 May 2019 • Siyuan Lu, Jinming Lu, Jun Lin, Zhongfeng Wang
Firstly, we improve the beam search decoding algorithm to save the storage space.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 4 Jul 2018 • Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang, Bo Yuan
Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper.
no code implementations • 10 Jul 2016 • Fangxuan Sun, Jun Lin, Zhongfeng Wang
In this paper, an equal distance nonuniform quantization (ENQ) scheme and a K-means clustering nonuniform quantization (KNQ) scheme are proposed to reduce the required memory storage when low complexity hardware or software implementations are considered.