Search Results for author: Naigang Wang

Found 16 papers, 3 papers with code

CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization

no code implementations30 Jan 2025 Yanxia Deng, Aozhong zhang, Naigang Wang, Selcuk Gurses, Zi Yang, Penghang Yin

Fine-tuning large language models (LLMs) using low-rank adaptation (LoRA) has become a highly efficient approach for downstream tasks, particularly in scenarios with limited computational resources.

Arithmetic Reasoning Text Generation

Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing

no code implementations9 Oct 2024 Ismail Erbas, Aporva Amarnath, Vikas Pandey, Karthik Swaminathan, Naigang Wang, Xavier Intes

The limited memory and computational resources on the FPGA require efficient scheduling of operations and memory allocation to deploy deep learning models for low-latency applications.

Knowledge Distillation Scheduling

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

no code implementations1 Oct 2024 Ismail Erbas, Vikas Pandey, Aporva Amarnath, Naigang Wang, Karthik Swaminathan, Stefan T. Radev, Xavier Intes

Fluorescence lifetime imaging (FLI) is an important technique for studying cellular environments and molecular interactions, but its real-time application is limited by slow data acquisition, which requires capturing large time-resolved images and complex post-processing using iterative fitting algorithms.

Computational Efficiency Knowledge Distillation +2

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

1 code implementation2 Jun 2024 Aozhong zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin

For example, we achieve a Wikitext2 perplexity of 5. 95 on the LLaMA2-70B model for per-channel INT2 weight quantization without incurring any inference overhead.

Quantization

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

no code implementations26 May 2024 Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers

The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i. e., experts, through trainable routers.

Binary Classification

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

1 code implementation4 Apr 2024 Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult.

Language Modeling Language Modelling +1

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

1 code implementation11 Mar 2024 Aozhong zhang, Zi Yang, Naigang Wang, Yingyong Qi, Jack Xin, Xin Li, Penghang Yin

Within a fixed layer, COMQ treats all the scaling factor(s) and bit-codes as the variables of the reconstruction error.

Quantization

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations27 Aug 2021 Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations2 Mar 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

A Comprehensive Survey on Hardware-Aware Neural Architecture Search

no code implementations22 Jan 2021 Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba, Naigang Wang

Arguably their most significant impact has been in image classification and object detection tasks where the state of the art results have been obtained.

Hardware Aware Neural Architecture Search Image Classification +4

Ultra-Low Precision 4-bit Training of Deep Neural Networks

no code implementations NeurIPS 2020 Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.

Quantization

The Sooner The Better: Investigating Structure of Early Winning Lottery Tickets

no code implementations25 Sep 2019 Shihui Yin, Kyu-Hyoun Kim, Jinwook Oh, Naigang Wang, Mauricio Serrano, Jae-sun Seo, Jungwook Choi

In the case of ResNet50 on ImageNet, this comes to the winning ticket of 75:02% Top-1 accuracy at 80% pruning rate in only 22% of the total epochs for iterative pruning.

Memorization

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

no code implementations ICLR 2019 Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan

Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.

Training Deep Neural Networks with 8-bit Floating Point Numbers

no code implementations NeurIPS 2018 Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.

Cannot find the paper you are looking for? You can Submit a new open access paper.