Search Results for author: Wonyong Sung

Found 37 papers, 6 papers with code

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

no code implementations9 Nov 2023 Jangwhan Lee, Minsoo Kim, SeungCheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands.

Computational Efficiency Quantization

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

1 code implementation23 Feb 2023 Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity.

Knowledge Distillation Quantization

Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage

no code implementations17 Feb 2023 Iksoo Choi, Wonyong Sung

As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way. In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest.

Classification EEG +2

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

no code implementations29 Jan 2023 Kyuhong Shim, Jungwook Choi, Wonyong Sung

In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference.

speech-recognition Speech Recognition

Macro-block dropout for improved regularization in training end-to-end speech recognition models

no code implementations29 Dec 2022 Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung

In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN).

speech-recognition Speech Recognition

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

no code implementations1 Oct 2022 Kyuhong Shim, Wonyong Sung

Our analyses show that Transformer and Conformer models benefit from the long-range accessibility of self-attention through input frames.

speech-recognition Speech Recognition

Similarity and Content-based Phonetic Self Attention for Speech Recognition

no code implementations19 Mar 2022 Kyuhong Shim, Wonyong Sung

Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames.

speech-recognition Speech Recognition

Korean Tokenization for Beam Search Rescoring in Speech Recognition

no code implementations22 Feb 2022 Kyuhong Shim, Hyewon Bae, Wonyong Sung

Although the common approach is to use the same tokenization method for external LM as the ASR model, we show that it may not be the best choice for Korean.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

1 code implementation 2021 IEEE Workshop on Signal Processing Systems (SiPS) 2021 Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, Byonghyo Shim

We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit.

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

1 code implementation 2021 18th International SoC Design Conference (ISOCC) 2021 Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use.

Language Modelling

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

no code implementations5 Sep 2020 Wonyong Sung, Iksoo Choi, Jinhwan Park, Seokhyun Choi, Sungho Shin

The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification.

Image Classification Scheduling

Quantized Neural Networks: Characterization and Holistic Optimization

no code implementations31 May 2020 Yoonho Boo, Sungho Shin, Wonyong Sung

This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design.

Model Selection Quantization

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

no code implementations2 Feb 2020 Sungho Shin, Yoonho Boo, Wonyong Sung

Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima.

Quantization

Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

no code implementations NeurIPS 2018 Jinhwan Park, Yoonho Boo, Iksoo Choi, Sungho Shin, Wonyong Sung

The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

EXPLORATION OF EFFICIENT ON-DEVICE ACOUSTIC MODELING WITH NEURAL NETWORKS

no code implementations27 Sep 2018 Wonyong Sung, Lukas Lee, Jinwhan Park

In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in the QRNNs and Gated ConvNets.

speech-recognition Speech Recognition

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference

no code implementations30 Mar 2018 Wonyong Sung, Jinhwan Park

As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests.

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

no code implementations NeurIPS 2017 Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung

The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words.

Language Modelling Machine Translation +1

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

no code implementations1 Jul 2017 Yoonho Boo, Wonyong Sung

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference.

Fixed-point optimization of deep neural networks with adaptive step size retraining

no code implementations27 Feb 2017 Sungho Shin, Yoonho Boo, Wonyong Sung

Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations.

Quantization

Quantized neural network design under weight capacity constraint

no code implementations19 Nov 2016 Sungho Shin, Kyuyeon Hwang, Wonyong Sung

The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights.

Quantization

Compact Deep Convolutional Neural Networks With Coarse Pruning

no code implementations30 Oct 2016 Sajid Anwar, Wonyong Sung

We propose feature map and kernel level pruning for reducing the computational complexity of a deep convolutional neural network.

Network Pruning

Character-Level Language Modeling with Hierarchical Recurrent Neural Networks

no code implementations13 Sep 2016 Kyuyeon Hwang, Wonyong Sung

Recurrent neural network (RNN) based character-level language models (CLMs) are extremely useful for modeling out-of-vocabulary words by nature.

Language Modelling speech-recognition +1

Generative Knowledge Transfer for Neural Language Models

no code implementations14 Aug 2016 Sungho Shin, Kyuyeon Hwang, Wonyong Sung

In this paper, we propose a generative knowledge transfer technique that trains an RNN based language model (student network) using text and output probabilities generated from a previously trained RNN (teacher network).

Language Modelling Text Generation +1

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

no code implementations4 Feb 2016 Jinhwan Park, Wonyong Sung

In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM.

Handwritten Digit Recognition

Character-Level Incremental Speech Recognition with Recurrent Neural Networks

1 code implementation25 Jan 2016 Kyuyeon Hwang, Wonyong Sung

The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding.

Language Modelling speech-recognition +1

Online Keyword Spotting with a Character-Level Recurrent Neural Network

no code implementations30 Dec 2015 Kyuyeon Hwang, Minjae Lee, Wonyong Sung

In this paper, we propose a context-aware keyword spotting model employing a character-level recurrent neural network (RNN) for spoken term detection in continuous speech.

General Classification Keyword Spotting

Structured Pruning of Deep Convolutional Neural Networks

1 code implementation29 Dec 2015 Sajid Anwar, Kyuyeon Hwang, Wonyong Sung

To decide the importance of network connections and paths, the proposed method uses a particle filtering approach.

Network Pruning

Fixed-Point Performance Analysis of Recurrent Neural Networks

no code implementations4 Dec 2015 Sungho Shin, Kyuyeon Hwang, Wonyong Sung

Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations.

Language Modelling Quantization

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

no code implementations21 Nov 2015 Kyuyeon Hwang, Wonyong Sung

Our online model achieves 20. 7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set.

General Classification Rolling Shutter Correction +2

Resiliency of Deep Neural Networks under Quantization

no code implementations20 Nov 2015 Wonyong Sung, Sungho Shin, Kyuyeon Hwang

In this work, the effects of retraining are analyzed for a feedforward deep neural network (FFDNN) and a convolutional neural network (CNN).

Quantization

Single stream parallelization of generalized LSTM-like RNNs on a GPU

no code implementations10 Mar 2015 Kyuyeon Hwang, Wonyong Sung

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data.

Cannot find the paper you are looking for? You can Submit a new open access paper.