Search Results for author: Geonhwa Jeong

Found 8 papers, 3 papers with code

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

no code implementations • 12 Mar 2024 • Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support.

Tensor Decomposition

Paper
Add Code

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

1 code implementation • 8 Mar 2024 • Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Quantization

Paper
Code

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

1 code implementation • 8 Mar 2024 • Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training.

Quantization

Paper
Code

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

no code implementations • 17 Feb 2023 • Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna

Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers.

Paper
Add Code

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

no code implementations • 5 Oct 2021 • Geonhwa Jeong, Eric Qin, Ananda Samajdar, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna

As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency.

Paper
Add Code

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

no code implementations • 15 Sep 2021 • Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna

The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper.

Paper
Add Code

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

no code implementations • 19 Jun 2021 • Gordon E. Moon, Hyoukjun Kwon, Geonhwa Jeong, Prasanth Chatarasi, Sivasankaran Rajamanickam, Tushar Krishna

There is a growing interest in custom spatial accelerators for machine learning applications.

Scheduling

Paper
Add Code

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

1 code implementation • 4 Sep 2020 • Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna

We also augment the RL approach with a genetic algorithm for further fine-tuning.

Bayesian Optimization reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.