1 code implementation • 20 Mar 2024 • Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun
This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data.
1 code implementation • 15 Dec 2023 • Minhyun Lee, Song Park, Byeongho Heo, Dongyoon Han, Hyunjung Shim
A recent breakthrough by SeiT proposed the use of Vector-Quantized (VQ) feature vectors (i. e., tokens) as network inputs for vision classification.
1 code implementation • ICCV 2023 • Song Park, Sanghyuk Chun, Byeongho Heo, Wonjae Kim, Sangdoo Yun
We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e. g., the LAION-4B dataset needs 240TB storage space).
no code implementations • 20 Oct 2022 • Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee
In recent years, many deep neural architectures have been developed for image classification.
2 code implementations • 7 Apr 2022 • Sanghyuk Chun, Wonjae Kim, Song Park, Minsuk Chang, Seong Joon Oh
Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models.
2 code implementations • 22 Dec 2021 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
Existing methods learn to disentangle style and content elements by developing a universal style representation for each font style.
no code implementations • 24 Aug 2021 • Sanghyuk Chun, Song Park
Hence, StyleAugment let the model observe abundant confounding cues for each image by on-the-fly the augmentation strategy, while the augmented images are more realistic than artistic style transferred images.
4 code implementations • ICCV 2021 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
MX-Font extracts multiple style features not explicitly conditioned on component labels, but automatically by multiple experts to represent different local concepts, e. g., left-side sub-glyph.
3 code implementations • 23 Sep 2020 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
However, learning component-wise styles solely from reference glyphs is infeasible in the few-shot font generation scenario, when a target script has a large number of components, e. g., over 200 for Chinese.