Search Results for author: Zhongfeng Wang

Found 35 papers, 6 papers with code

View Dialogue in 2D: A Two-stream Model in Time-speaker Perspective for Dialogue Summarization and beyond

no code implementations COLING 2022 Keli Xie, Dongchen He, Jiaxin Zhuang, Siyuan Lu, Zhongfeng Wang

To better capture the dialogue information, we propose a 2D view of dialogue based on a time-speaker perspective, where the time and speaker streams of dialogue can be obtained as strengthened input.

Document Summarization Machine Reading Comprehension

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

no code implementations24 Nov 2024 Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst

To address these challenges, we investigate the sensitivity of activation precision across various LLM modules and its impact on overall model accuracy.

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

no code implementations21 Nov 2024 Xinyan Liu, Huihong Shi, Yang Xu, Zhongfeng Wang

Transformer-based diffusion models, dubbed Diffusion Transformers (DiTs), have achieved state-of-the-art performance in image and video generation tasks.

Denoising Model Compression +2

M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization

no code implementations10 Oct 2024 Yanbiao Liang, Huihong Shi, Zhongfeng Wang

While prior work has explored quantization for efficient ViTs to marry the best of efficient hybrid ViT architectures and quantization, it focuses on uniform quantization and overlooks the potential advantages of mixed quantization.

Efficient ViTs Quantization

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

no code implementations7 Sep 2024 Yang Xu, Huihong Shi, Zhongfeng Wang

To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models.

Neural Architecture Search

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

no code implementations16 Jul 2024 Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang

Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices.

Quantization Scheduling

P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

1 code implementation30 May 2024 Huihong Shi, Xin Cheng, Wendong Mao, Zhongfeng Wang

Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices.

Quantization

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

1 code implementation6 May 2024 Huihong Shi, Haikuo Shao, Wendong Mao, Zhongfeng Wang

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks.

Efficient ViTs Model Compression +1

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

no code implementations22 Feb 2024 Miaoxin Wang, Xiao Wu, Jun Lin, Zhongfeng Wang

Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169. 68 GOPS and 244. 55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time.

Computational Efficiency

BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge

no code implementations22 Jan 2024 Yuhao Ji, Chao Fang, Zhongfeng Wang

Existing binary Transformers are promising in edge deployment due to their compact model size, low computational complexity, and considerable inference accuracy.

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

no code implementations22 Sep 2023 Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang

At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy.

Computational Efficiency Scheduling

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

no code implementations15 Sep 2023 Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang

However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy.

Quantization

S2R: Exploring a Double-Win Transformer-Based Framework for Ideal and Blind Super-Resolution

1 code implementation16 Aug 2023 Minghao She, Wendong Mao, Huihong Shi, Zhongfeng Wang

In this paper, we propose a double-win framework for ideal and blind SR task, named S2R, including a light-weight transformer-based SR model (S2R transformer) and a novel coarse-to-fine training strategy, which can achieve excellent visual results on both ideal and random fuzzy conditions.

Blind Super-Resolution Super-Resolution +1

Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

no code implementations16 Nov 2022 Siyuan Lu, Chenchen Zhou, Keli Xie, Jun Lin, Zhongfeng Wang

Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size.

Sentiment Analysis

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

1 code implementation9 Nov 2022 Jyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin

Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs.

An Efficient FPGA-based Accelerator for Deep Forest

no code implementations4 Nov 2022 Mingyu Zhu, Jiapeng Luo, Wendong Mao, Zhongfeng Wang

In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA.

BEBERT: Efficient and Robust Binary Ensemble BERT

1 code implementation28 Oct 2022 Jiayi Tian, Chao Fang, Haonan Wang, Zhongfeng Wang

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks.

Binarization Computational Efficiency +1

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2 code implementations24 Oct 2022 Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin

Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications.

Neural Architecture Search

Accelerate Three-Dimensional Generative Adversarial Networks Using Fast Algorithm

no code implementations18 Oct 2022 Ziqi Su, Wendong Mao, Zhongfeng Wang, Jun Lin, WenQiang Wang, Haitao Sun

3D deconvolution (DeConv), as an important computation of 3D-GAN, significantly increases computational complexity compared with 2D DeConv.

Computational Efficiency

An Efficient FPGA Accelerator for Point Cloud

no code implementations14 Oct 2022 Zilun Wang, Wendong Mao, Peixiang Yang, Zhongfeng Wang, Jun Lin

The submanifold sparse convolutional network (SSCN) has been widely used for the point cloud due to its unique advantages in terms of visual results.

Autonomous Driving Computational Efficiency

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

no code implementations12 Aug 2022 Chao Fang, Aojun Zhou, Zhongfeng Wang

(1) From algorithm perspective, we propose a sparsity inheritance mechanism along with an inherited dynamic pruning (IDP) method to obtain a series of N:M sparse candidate Transformers rapidly.

Computational Efficiency Model Compression

GANDSE: Generative Adversarial Network based Design Space Exploration for Neural Network Accelerator Design

no code implementations1 Aug 2022 Lang Feng, Wenjian Liu, Chuliang Guo, Ke Tang, Cheng Zhuo, Zhongfeng Wang

To improve the design quality while saving the cost, design automation for neural network accelerators was proposed, where design space exploration algorithms are used to automatically search the optimized accelerator design within a design space.

Deep Learning Deep Reinforcement Learning +1

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

no code implementations12 Oct 2021 Zhuang Shao, Xiaoliang Chen, Li Du, Lei Chen, Yuan Du, Wei Zhuang, Huadong Wei, Chenjia Xie, Zhongfeng Wang

To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps.

Feature Compression Quantization

Elbert: Fast Albert with Confidence-Window Based Early Exit

no code implementations1 Jul 2021 Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang

Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed.

Decision Making

Transform-Based Feature Map Compression for CNN Inference

no code implementations24 Jun 2021 Yubo Shi, Meiqi Wang, Siyi Chen, Jinghe Wei, Zhongfeng Wang

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently.

Quantization

Training Deep Neural Networks Using Posit Number System

no code implementations6 Sep 2019 Jinming Lu, Siyuan Lu, Zhisheng Wang, Chao Fang, Jun Lin, Zhongfeng Wang, Li Du

With the increasing size of Deep Neural Network (DNN) models, the high memory space requirements and computational complexity have become an obstacle for efficient DNN implementations.

Image Classification

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

no code implementations31 May 2019 Haonan Wang, Jun Lin, Zhongfeng Wang

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promising performance on video recognition tasks because of its powerful spatio-temporal information fusion ability.

Video Recognition

SGAD: Soft-Guided Adaptively-Dropped Neural Network

no code implementations4 Jul 2018 Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang, Bo Yuan

Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper.

Model Compression

Intra-layer Nonuniform Quantization for Deep Convolutional Neural Network

no code implementations10 Jul 2016 Fangxuan Sun, Jun Lin, Zhongfeng Wang

In this paper, an equal distance nonuniform quantization (ENQ) scheme and a K-means clustering nonuniform quantization (KNQ) scheme are proposed to reduce the required memory storage when low complexity hardware or software implementations are considered.

Clustering General Classification +5

Cannot find the paper you are looking for? You can Submit a new open access paper.