no code implementations • 16 Jun 2022 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan
We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T).
no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan
We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.
no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko
Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
1 code implementation • NeurIPS 2020 • Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Celine Lin
Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs.
no code implementations • NeurIPS 2020 • Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan
In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.
no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan
Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.
no code implementations • ICLR 2019 • Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan
Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.
no code implementations • NeurIPS 2018 • Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan
The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.
no code implementations • 17 Jul 2018 • Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost.
3 code implementations • ICLR 2018 • Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.
no code implementations • 7 Dec 2017 • Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei zhang, Kailash Gopalakrishnan
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.
2 code implementations • 9 Feb 2015 • Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan
Training of large-scale deep neural networks is often constrained by the available computational resources.
no code implementations • 9 Sep 2014 • Suyog Gupta, Vikas Sindhwani, Kailash Gopalakrishnan
This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations.