1 code implementation • 11 Mar 2024 • Hongsun Jang, Jaeyong Song, Jaewon Jung, Jaeyoung Park, Youngsok Kim, Jinho Lee
Our work, Smart-Infinity, addresses the storage bandwidth bottleneck of storage-offloaded LLM training using near-storage processing devices on a real system.
no code implementations • 12 Nov 2023 • Jaeyong Song, Hongsun Jang, Jaewon Jung, Youngsok Kim, Jinho Lee
According to the growth in the dataset and the model size used for GNNs, an important problem is that it becomes nearly impossible to keep the whole network on GPU memory.
1 code implementation • 29 Jan 2023 • Hongsun Jang, Jaewon Jung, Jaeyong Song, Joonsang Yu, Youngsok Kim, Jinho Lee
However, this results in a high overhead of redundant teacher execution, low GPU utilization, and extra data loading.
1 code implementation • 25 Jan 2023 • Mingi Yoo, Jaeyong Song, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee
A GCN takes as input an arbitrarily structured graph and executes a series of layers which exploit the graph's structure to calculate their output features.
no code implementations • 24 Jan 2023 • Jaeyong Song, Jinkyu Yim, Jaewon Jung, Hongsun Jang, Hyung-Jin Kim, Youngsok Kim, Jinho Lee
Compressing the communication is one way to mitigate the overhead by reducing the inter-node traffic volume; however, the existing compression techniques have critical limitations to be applied for NLP models with 3D parallelism in that 1) only the data parallelism traffic is targeted, and 2) the existing compression schemes already harm the model quality too much.
no code implementations • 24 Jan 2023 • Mingi Yoo, Jaeyong Song, Hyeyoon Lee, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee
Graph convolutional networks (GCNs) are becoming increasingly popular as they can process a wide variety of data formats that prior deep neural networks cannot easily support.
no code implementations • 23 Jan 2023 • Deokki Hong, Kanghyun Choi, Hye Yoon Lee, Joonsang Yu, Noseong Park, Youngsok Kim, Jinho Lee
Co-exploration of an optimal neural architecture and its hardware accelerator is an approach of rising interest which addresses the computational cost problem, especially in low-profile systems.
1 code implementation • CVPR 2022 • Kanghyun Choi, Hye Yoon Lee, Deokki Hong, Joonsang Yu, Noseong Park, Youngsok Kim, Jinho Lee
To deal with the performance drop induced by quantization errors, a popular method is to use training data to fine-tune quantized networks.
2 code implementations • NeurIPS 2021 • Kanghyun Choi, Deokki Hong, Noseong Park, Youngsok Kim, Jinho Lee
We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries.
Ranked #1 on Data Free Quantization on CIFAR-100
no code implementations • 29 Sep 2021 • Deokki Hong, Kanghyun Choi, Hey Yoon Lee, Joonsang Yu, Youngsok Kim, Noseong Park, Jinho Lee
To handle the hard constraint problem of differentiable co-exploration, we propose ConCoDE, which searches for hard-constrained solutions without compromising the global design objectives.
no code implementations • 14 Sep 2020 • Kanghyun Choi, Deokki Hong, Hojae Yoon, Joonsang Yu, Youngsok Kim, Jinho Lee
In such circumstances, this work presents DANCE, a differentiable approach towards the co-exploration of the hardware accelerator and network architecture design.