Search Results for author: Chia-Yu Chen

Found 9 papers, 0 papers with code

Attention-based Learning for Sleep Apnea and Limb Movement Detection using Wi-Fi CSI Signals

no code implementations • 26 Mar 2023 • Chi-Che Chang, An-Hung Hsiao, Li-Hsiang Shen, Kai-Ten Feng, Chia-Yu Chen

In this paper, we propose the attention-based learning for sleep apnea and limb movement detection (ALESAL) system that can jointly detect sleep apnea and PLMD under different sleep postures across a variety of patients.

Paper
Add Code

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

no code implementations • 16 Jun 2022 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan

We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T).

Language Modelling Model Compression +1

Paper
Add Code

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.

Paper
Add Code

Ultra-Low Precision 4-bit Training of Deep Neural Networks

no code implementations • NeurIPS 2020 • Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.

Quantization

Paper
Add Code

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan

Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.

Image Classification object-detection +1

Paper
Add Code

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

no code implementations • ICLR 2019 • Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan

Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.

Paper
Add Code

Training Deep Neural Networks with 8-bit Floating Point Numbers

no code implementations • NeurIPS 2018 • Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.

Paper
Add Code

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

no code implementations • 7 Dec 2017 • Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei zhang, Kailash Gopalakrishnan

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.