Search Results for author: Jungwook Choi

Found 24 papers, 7 papers with code

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

no code implementations3 Jul 2024 Janghwan Lee, Seongmin Park, Sukjin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi

The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF).

Chatbot Computational Efficiency +2

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

no code implementations9 Nov 2023 Janghwan Lee, Minsoo Kim, SeungCheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands.

Computational Efficiency Quantization

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

no code implementations12 May 2023 Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements.

3D Object Detection Autonomous Driving +2

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

1 code implementation23 Feb 2023 Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity.

Knowledge Distillation Quantization

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

no code implementations29 Jan 2023 Kyuhong Shim, Jungwook Choi, Wonyong Sung

In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference.

speech-recognition Speech Recognition

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

no code implementations21 Dec 2022 Seongmin Park, Beomseok Kwon, Jieun Lim, Kyuyoung Sim, Tae-Ho Kim, Jungwook Choi

Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability.

Neural Architecture Search Quantization

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation20 Nov 2022 Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

no code implementations11 Feb 2022 Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jungwook Choi, Jieun Lim

In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA.

Scheduling

Thermal Face Detection for High-Speed AI Thermometer

1 code implementation 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC) 2022 Woongkyu Lee, Hyucksung Kwon, Jungwook Choi

However, the computation-demanding nature of DNNs, along with the time-consuming fusion of video and thermal camera frames, raises hurdles for the cost-effective deployment of such AI thermometer systems.

Face Detection object-detection +2

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

no code implementations3 Dec 2021 Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

1 code implementation 2021 IEEE Workshop on Signal Processing Systems (SiPS) 2021 Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, Byonghyo Shim

We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit.

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

1 code implementation 2021 18th International SoC Design Conference (ISOCC) 2021 Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use.

Language Modelling

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

no code implementations4 Jan 2021 Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi

Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.

BIG-bench Machine Learning Decision Making

Uniform-Precision Neural Network Quantization via Neural Channel Expansion

no code implementations1 Jan 2021 Seongmin Park, Beomseok Kwon, Kyuyoung Sim, Jieun Lim, Tae-Ho Kim, Jungwook Choi

Uniform-precision neural network quantization has gained popularity thanks to its simple arithmetic unit densely packed for high computing capability.

Neural Architecture Search Quantization

The Sooner The Better: Investigating Structure of Early Winning Lottery Tickets

no code implementations25 Sep 2019 Shihui Yin, Kyu-Hyoun Kim, Jinwook Oh, Naigang Wang, Mauricio Serrano, Jae-sun Seo, Jungwook Choi

In the case of ResNet50 on ImageNet, this comes to the winning ticket of 75:02% Top-1 accuracy at 80% pruning rate in only 22% of the total epochs for iterative pruning.

Memorization

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

no code implementations ICLR 2019 Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan

Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.

Training Deep Neural Networks with 8-bit Floating Point Numbers

no code implementations NeurIPS 2018 Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.

PACT: Parameterized Clipping Activation for Quantized Neural Networks

3 code implementations ICLR 2018 Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.

Quantization

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

no code implementations7 Dec 2017 Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei zhang, Kailash Gopalakrishnan

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.