Search Results for author: Zhongfeng Wang

Found 27 papers, 4 papers with code

View Dialogue in 2D: A Two-stream Model in Time-speaker Perspective for Dialogue Summarization and beyond

no code implementations • COLING 2022 • Keli Xie, Dongchen He, Jiaxin Zhuang, Siyuan Lu, Zhongfeng Wang

To better capture the dialogue information, we propose a 2D view of dialogue based on a time-speaker perspective, where the time and speaker streams of dialogue can be obtained as strengthened input.

Document Summarization Machine Reading Comprehension

Paper
Add Code

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT

no code implementations • 29 Mar 2024 • Haikuo Shao, Huihong Shi, Wendong Mao, Zhongfeng Wang

Vision Transformers (ViTs) have achieved significant success in computer vision.

Efficient ViTs

Paper
Add Code

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

no code implementations • 22 Feb 2024 • Miaoxin Wang, Xiao Wu, Jun Lin, Zhongfeng Wang

Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169. 68 GOPS and 244. 55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time.

Computational Efficiency

Paper
Add Code

BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge

no code implementations • 22 Jan 2024 • Yuhao Ji, Chao Fang, Zhongfeng Wang

Existing binary Transformers are promising in edge deployment due to their compact model size, low computational complexity, and considerable inference accuracy.

Paper
Add Code

A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

no code implementations • 17 Dec 2023 • Siyu Zhang, Wendong Mao, Huihong Shi, Zhongfeng Wang

Video compression is widely used in digital television, surveillance systems, and virtual reality.

Video Compression

Paper
Add Code

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

no code implementations • 22 Sep 2023 • Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang

At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy.

Computational Efficiency Scheduling

Paper
Add Code

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

no code implementations • 15 Sep 2023 • Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang

However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy.

Quantization

Paper
Add Code

S2R: Exploring a Double-Win Transformer-Based Framework for Ideal and Blind Super-Resolution

1 code implementation • 16 Aug 2023 • Minghao She, Wendong Mao, Huihong Shi, Zhongfeng Wang

In this paper, we propose a double-win framework for ideal and blind SR task, named S2R, including a light-weight transformer-based SR model (S2R transformer) and a novel coarse-to-fine training strategy, which can achieve excellent visual results on both ideal and random fuzzy conditions.

Blind Super-Resolution Super-Resolution +1

Paper
Code

Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

no code implementations • 16 Nov 2022 • Siyuan Lu, Chenchen Zhou, Keli Xie, Jun Lin, Zhongfeng Wang

Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size.

Sentiment Analysis

Paper
Add Code

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

1 code implementation • 9 Nov 2022 • Jyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin

Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs.

Paper
Code

An Efficient FPGA-based Accelerator for Deep Forest

no code implementations • 4 Nov 2022 • Mingyu Zhu, Jiapeng Luo, Wendong Mao, Zhongfeng Wang

In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA.

Paper
Add Code

BEBERT: Efficient and Robust Binary Ensemble BERT

1 code implementation • 28 Oct 2022 • Jiayi Tian, Chao Fang, Haonan Wang, Zhongfeng Wang

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks.

Binarization Computational Efficiency +1

Paper
Code

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2 code implementations • 24 Oct 2022 • Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin

Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications.

Neural Architecture Search

Paper
Code

Accelerate Three-Dimensional Generative Adversarial Networks Using Fast Algorithm

no code implementations • 18 Oct 2022 • Ziqi Su, Wendong Mao, Zhongfeng Wang, Jun Lin, WenQiang Wang, Haitao Sun

3D deconvolution (DeConv), as an important computation of 3D-GAN, significantly increases computational complexity compared with 2D DeConv.

Computational Efficiency

Paper
Add Code

An Efficient FPGA Accelerator for Point Cloud

no code implementations • 14 Oct 2022 • Zilun Wang, Wendong Mao, Peixiang Yang, Zhongfeng Wang, Jun Lin

The submanifold sparse convolutional network (SSCN) has been widely used for the point cloud due to its unique advantages in terms of visual results.

Autonomous Driving Computational Efficiency

Paper
Add Code

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

no code implementations • 12 Aug 2022 • Chao Fang, Aojun Zhou, Zhongfeng Wang

(1) From algorithm perspective, we propose a sparsity inheritance mechanism along with an inherited dynamic pruning (IDP) method to obtain a series of N:M sparse candidate Transformers rapidly.

Computational Efficiency Model Compression

Paper
Add Code

GANDSE: Generative Adversarial Network based Design Space Exploration for Neural Network Accelerator Design

no code implementations • 1 Aug 2022 • Lang Feng, Wenjian Liu, Chuliang Guo, Ke Tang, Cheng Zhuo, Zhongfeng Wang

To improve the design quality while saving the cost, design automation for neural network accelerators was proposed, where design space exploration algorithms are used to automatically search the optimized accelerator design within a design space.

Generative Adversarial Network

Paper
Add Code

Learning Robust and Lightweight Model through Separable Structured Transformations

no code implementations • 27 Dec 2021 • Xian Wei, Yanhui Huang, Yangyu Xu, Mingsong Chen, Hai Lan, Yuanxiang Li, Zhongfeng Wang, Xuan Tang

Learning deep models with both lightweight and robustness is necessary for these equipments.

Paper
Add Code

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

no code implementations • 12 Oct 2021 • Zhuang Shao, Xiaoliang Chen, Li Du, Lei Chen, Yuan Du, Wei Zhuang, Huadong Wei, Chenjia Xie, Zhongfeng Wang

To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps.

Feature Compression Quantization

Paper
Add Code

Elbert: Fast Albert with Confidence-Window Based Early Exit

no code implementations • 1 Jul 2021 • Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang

Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed.

Decision Making

Paper
Add Code

Transform-Based Feature Map Compression for CNN Inference

no code implementations • 24 Jun 2021 • Yubo Shi, Meiqi Wang, Siyi Chen, Jinghe Wei, Zhongfeng Wang

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently.

Quantization

Paper
Add Code

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

no code implementations • 18 Sep 2020 • Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang

Designing hardware accelerators for deep neural networks (DNNs) has been much desired.

Position

Paper
Add Code

Training Deep Neural Networks Using Posit Number System

no code implementations • 6 Sep 2019 • Jinming Lu, Siyuan Lu, Zhisheng Wang, Chao Fang, Jun Lin, Zhongfeng Wang, Li Du

With the increasing size of Deep Neural Network (DNN) models, the high memory space requirements and computational complexity have become an obstacle for efficient DNN implementations.

Image Classification

Paper
Add Code

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

no code implementations • 31 May 2019 • Haonan Wang, Jun Lin, Zhongfeng Wang

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promising performance on video recognition tasks because of its powerful spatio-temporal information fusion ability.

Video Recognition

Paper
Add Code

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding

no code implementations • 8 May 2019 • Siyuan Lu, Jinming Lu, Jun Lin, Zhongfeng Wang

Firstly, we improve the beam search decoding algorithm to save the storage space.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

SGAD: Soft-Guided Adaptively-Dropped Neural Network

no code implementations • 4 Jul 2018 • Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang, Bo Yuan

Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper.

Model Compression

Paper
Add Code

Intra-layer Nonuniform Quantization for Deep Convolutional Neural Network

no code implementations • 10 Jul 2016 • Fangxuan Sun, Jun Lin, Zhongfeng Wang

In this paper, an equal distance nonuniform quantization (ENQ) scheme and a K-means clustering nonuniform quantization (KNQ) scheme are proposed to reduce the required memory storage when low complexity hardware or software implementations are considered.

Clustering General Classification +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.