Search Results for author: Wonyong Sung

Found 37 papers, 6 papers with code

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

no code implementations • 9 Nov 2023 • Jangwhan Lee, Minsoo Kim, SeungCheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands.

Computational Efficiency Quantization

Paper
Add Code

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.

Arithmetic Reasoning Common Sense Reasoning +4

Paper
Code

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

1 code implementation • 23 Feb 2023 • Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity.

Knowledge Distillation Quantization

Paper
Code

Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage

no code implementations • 17 Feb 2023 • Iksoo Choi, Wonyong Sung

As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way. In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest.

Classification EEG +2

Paper
Add Code

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

no code implementations • 29 Jan 2023 • Kyuhong Shim, Jungwook Choi, Wonyong Sung

In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference.

speech-recognition Speech Recognition

Paper
Add Code

Macro-block dropout for improved regularization in training end-to-end speech recognition models

no code implementations • 29 Dec 2022 • Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung

In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN).

speech-recognition Speech Recognition

Paper
Add Code

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

no code implementations • 1 Oct 2022 • Kyuhong Shim, Wonyong Sung

Our analyses show that Transformer and Conformer models benefit from the long-range accessibility of self-attention through input frames.

speech-recognition Speech Recognition

Paper
Add Code

Similarity and Content-based Phonetic Self Attention for Speech Recognition

no code implementations • 19 Mar 2022 • Kyuhong Shim, Wonyong Sung

Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames.

speech-recognition Speech Recognition

Paper
Add Code

Korean Tokenization for Beam Search Rescoring in Speech Recognition

no code implementations • 22 Feb 2022 • Kyuhong Shim, Hyewon Bae, Wonyong Sung

Although the common approach is to use the same tokenization method for external LM as the ASR model, we show that it may not be the best choice for Korean.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

1 code implementation • 2021 IEEE Workshop on Signal Processing Systems (SiPS) 2021 • Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, Byonghyo Shim

We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit.

Paper
Code

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

1 code implementation • 2021 18th International SoC Design Conference (ISOCC) 2021 • Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use.

Language Modelling

Paper
Code

Understanding the Role of Self Attention for Efficient Speech Recognition

no code implementations • ICLR 2022 • Kyuhong Shim, Jungwook Choi, Wonyong Sung

Self-attention (SA) is a critical component of Transformer neural networks that have succeeded in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

no code implementations • 30 Sep 2020 • Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ).

Image Classification Quantization +3

Paper
Add Code

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

no code implementations • 5 Sep 2020 • Wonyong Sung, Iksoo Choi, Jinhwan Park, Seokhyun Choi, Sungho Shin

The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification.

Image Classification Scheduling

Paper
Add Code

Quantized Neural Networks: Characterization and Holistic Optimization

no code implementations • 31 May 2020 • Yoonho Boo, Sungho Shin, Wonyong Sung

This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design.

Model Selection Quantization

Paper
Add Code

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

no code implementations • 2 Feb 2020 • Sungho Shin, Yoonho Boo, Wonyong Sung

Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima.

Quantization

Paper
Add Code

Knowledge distillation for optimization of quantized deep neural networks

no code implementations • 4 Sep 2019 • Sungho Shin, Yoonho Boo, Wonyong Sung

Knowledge distillation (KD) is a very popular method for model size reduction.

Knowledge Distillation

Paper
Add Code

Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

no code implementations • NeurIPS 2018 • Jinhwan Park, Yoonho Boo, Iksoo Choi, Sungho Shin, Wonyong Sung

The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

EXPLORATION OF EFFICIENT ON-DEVICE ACOUSTIC MODELING WITH NEURAL NETWORKS

no code implementations • 27 Sep 2018 • Wonyong Sung, Lukas Lee, Jinwhan Park

In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in the QRNNs and Gated ConvNets.

speech-recognition Speech Recognition

Paper
Add Code

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference

no code implementations • 30 Mar 2018 • Wonyong Sung, Jinhwan Park

As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests.

Paper
Add Code

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

no code implementations • NeurIPS 2017 • Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung

The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words.

Language Modelling Machine Translation +1

Paper
Add Code

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

no code implementations • 1 Jul 2017 • Yoonho Boo, Wonyong Sung

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference.

Paper
Add Code

Fixed-point optimization of deep neural networks with adaptive step size retraining

no code implementations • 27 Feb 2017 • Sungho Shin, Yoonho Boo, Wonyong Sung

Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations.

Quantization

Paper
Add Code

Quantized neural network design under weight capacity constraint

no code implementations • 19 Nov 2016 • Sungho Shin, Kyuyeon Hwang, Wonyong Sung

The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights.

Quantization

Paper
Add Code

Compact Deep Convolutional Neural Networks With Coarse Pruning

no code implementations • 30 Oct 2016 • Sajid Anwar, Wonyong Sung

We propose feature map and kernel level pruning for reducing the computational complexity of a deep convolutional neural network.

Network Pruning

Paper
Add Code

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

no code implementations • 30 Sep 2016 • Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sungwook Choi, Sungho Shin, Wonyong Sung

The weights are quantized to 6 bits to store all of them in the on-chip memory of an FPGA.

Language Modelling speech-recognition +1

Paper
Add Code

Character-Level Language Modeling with Hierarchical Recurrent Neural Networks

no code implementations • 13 Sep 2016 • Kyuyeon Hwang, Wonyong Sung

Recurrent neural network (RNN) based character-level language models (CLMs) are extremely useful for modeling out-of-vocabulary words by nature.

Language Modelling speech-recognition +1

Paper
Add Code

Dynamic Hand Gesture Recognition for Wearable Devices with Low Complexity Recurrent Neural Networks

no code implementations • 14 Aug 2016 • Sungho Shin, Wonyong Sung

Gesture recognition is a very essential technology for many wearable devices.

Hand Gesture Recognition Hand-Gesture Recognition

Paper
Add Code

Generative Knowledge Transfer for Neural Language Models

no code implementations • 14 Aug 2016 • Sungho Shin, Kyuyeon Hwang, Wonyong Sung

In this paper, we propose a generative knowledge transfer technique that trains an RNN based language model (student network) using text and output probabilities generated from a previously trained RNN (teacher network).

Language Modelling Text Generation +1

Paper
Add Code

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

no code implementations • 4 Feb 2016 • Jinhwan Park, Wonyong Sung

In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM.

Handwritten Digit Recognition

Paper
Add Code

Character-Level Incremental Speech Recognition with Recurrent Neural Networks

1 code implementation • 25 Jan 2016 • Kyuyeon Hwang, Wonyong Sung

The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding.

Language Modelling speech-recognition +1

Paper
Code

Online Keyword Spotting with a Character-Level Recurrent Neural Network

no code implementations • 30 Dec 2015 • Kyuyeon Hwang, Minjae Lee, Wonyong Sung

In this paper, we propose a context-aware keyword spotting model employing a character-level recurrent neural network (RNN) for spoken term detection in continuous speech.

General Classification Keyword Spotting

Paper
Add Code

Structured Pruning of Deep Convolutional Neural Networks

1 code implementation • 29 Dec 2015 • Sajid Anwar, Kyuyeon Hwang, Wonyong Sung

To decide the importance of network connections and paths, the proposed method uses a particle filtering approach.

Network Pruning

Paper
Code

Fixed-Point Performance Analysis of Recurrent Neural Networks

no code implementations • 4 Dec 2015 • Sungho Shin, Kyuyeon Hwang, Wonyong Sung

Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations.

Language Modelling Quantization

Paper
Add Code

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

no code implementations • 21 Nov 2015 • Kyuyeon Hwang, Wonyong Sung

Our online model achieves 20. 7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set.

General Classification Rolling Shutter Correction +2

Paper
Add Code

Resiliency of Deep Neural Networks under Quantization

no code implementations • 20 Nov 2015 • Wonyong Sung, Sungho Shin, Kyuyeon Hwang

In this work, the effects of retraining are analyzed for a feedforward deep neural network (FFDNN) and a convolutional neural network (CNN).

Quantization

Paper
Add Code

Single stream parallelization of generalized LSTM-like RNNs on a GPU

no code implementations • 10 Mar 2015 • Kyuyeon Hwang, Wonyong Sung

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.