Search Results for author: Hyoukjun Kwon

Found 10 papers, 1 papers with code

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

no code implementations17 Apr 2024 Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon

Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models.

Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

no code implementations14 Dec 2023 Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators.

Scheduling

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

no code implementations7 Dec 2022 Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra

Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads.

Scheduling

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

1 code implementation CVPR 2022 Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan

Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs.

Image Classification Representation Learning +3

Marvel: A Data-centric Compiler for DNN Operators on Spatial Accelerators

no code implementations18 Feb 2020 Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Angshuman Parashar, Michael Pellauer, Tushar Krishna, Vivek Sarkar

Searching for the optimal mappings is challenging because of the large space of mappings, and this challenge gets exacerbated with new operators and diverse accelerator configurations. To address this challenge, we propose a decoupled off-chip/on-chip approach that decomposes the mapping space into off-chip and on-chip subspaces, and first optimizes the off-chip subspace followed by the on-chip subspace.

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

no code implementations10 Feb 2020 Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs).

Neural Architecture Search

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

no code implementations13 Sep 2019 Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra

The results suggest that HDA is an alternative class of Pareto-optimal accelerators to RDA with strength in energy, which can be a better choice than RDAs depending on the use cases.

Distributed, Parallel, and Cluster Computing

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO

no code implementations4 May 2018 Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, Tushar Krishna

The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, and they directly impact the performance and energy efficiency of DNN accelerator designs.

Scheduling valid

Cannot find the paper you are looking for? You can Submit a new open access paper.