1 code implementation • 24 Oct 2024 • Sanghyuk Chun, Wonjae Kim, Song Park, Sangdoo Yun
Vision-language models (VLMs) embed aligned image-text pairs into a joint space but often rely on deterministic embeddings, assuming a one-to-one correspondence between images and texts.
1 code implementation • 8 Jul 2024 • Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee
Despite the impressive progress of multimodal generative models, video-to-audio generation still suffers from limited performance and limits the flexibility to prioritize sound synthesis for specific objects within the scene.
Ranked #7 on Video-to-Sound Generation on VGG-Sound
1 code implementation • 13 Jun 2024 • Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim, Sanghyuk Chun, Taesup Moon
To that end, we introduce the Reducing Task Discrepancy of text encoders for Composed Image Retrieval (RTD), a plug-and-play training scheme for the text encoder that enhances its capability using a novel target-anchored text contrastive learning.
1 code implementation • 26 Apr 2024 • Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun
In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training.
no code implementations • 27 Mar 2024 • Jungbeom Lee, Sanghyuk Chun, Sangdoo Yun
Recent Vision-Language Pre-training (VLP) models have demonstrated significant advancements.
1 code implementation • CVPR 2024 • Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun
Our LinCIR (Language-only training for CIR) can be trained only with text datasets by a novel self-supervision named self-masking projection (SMP).
1 code implementation • 4 Dec 2023 • Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun
Our LinCIR (Language-only training for CIR) can be trained only with text datasets by a novel self-supervision named self-masking projection (SMP).
1 code implementation • 20 Oct 2023 • Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han
MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input.
1 code implementation • 29 May 2023 • Sanghyuk Chun
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the inherent ambiguity arising from multiplicity and imperfect annotations.
1 code implementation • 21 Apr 2023 • Seulki Park, Daeho Um, Hajung Yoon, Sanghyuk Chun, Sangdoo Yun, Jin Young Choi
With the extensive use of vision-language models in various downstream tasks, evaluating their robustness is crucial.
1 code implementation • 10 Apr 2023 • Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun
Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs).
Ranked #6 on 3D Multi-Person Pose Estimation on MuPoTS-3D
1 code implementation • 21 Mar 2023 • Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun
This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion.
1 code implementation • ICCV 2023 • Song Park, Sanghyuk Chun, Byeongho Heo, Wonjae Kim, Sangdoo Yun
We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e. g., the LAION-4B dataset needs 240TB storage space).
no code implementations • 1 Mar 2023 • Sangwon Jung, TaeEon Park, Sanghyuk Chun, Taesup Moon
Many existing group fairness-aware training methods aim to achieve the group fairness by either re-weighting underrepresented groups based on certain rules or using weakly approximated surrogates for the fairness metrics in the objective as regularization terms.
no code implementations • 8 Dec 2022 • Byungsoo Ko, Han-Gyu Kim, Byeongho Heo, Sangdoo Yun, Sanghyuk Chun, Geonmo Gu, Wonjae Kim
As ViT groups the channels via a multi-head attention mechanism, grouping the channels by GGeM leads to lower head-wise dependence while amplifying important channels on the activation maps.
no code implementations • 20 Oct 2022 • Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee
In recent years, many deep neural architectures have been developed for image classification.
1 code implementation • 21 Aug 2022 • Chanwoo Park, Sangdoo Yun, Sanghyuk Chun
Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters.
1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers have been widely used in numerous vision problems especially for visual recognition and detection.
2 code implementations • 7 Apr 2022 • Sanghyuk Chun, Wonjae Kim, Song Park, Minsuk Chang, Seong Joon Oh
Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models.
1 code implementation • 21 Mar 2022 • Junbum Cha, Kyungjae Lee, Sungrae Park, Sanghyuk Chun
Domain generalization (DG) aims to learn a generalized model to an unseen target domain using only limited source domains.
Ranked #3 on Domain Generalization on TerraIncognita
2 code implementations • 7 Feb 2022 • Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, Sungroh Yoon
However, in this study, we prove that the existing DC methods can perform worse than the random selection method when task-irrelevant information forms a significant part of the training dataset.
2 code implementations • 22 Dec 2021 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
Existing methods learn to disentangle style and content elements by developing a universal style representation for each font style.
1 code implementation • CVPR 2022 • Sangwon Jung, Sanghyuk Chun, Taesup Moon
To address this problem, we propose a simple Confidence-based Group Label assignment (CGL) strategy that is readily applicable to any fairness-aware learning method.
1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers are transforming the landscape of computer vision, especially for recognition tasks.
Ranked #16 on Object Detection on COCO 2017 val
no code implementations • ICLR 2022 • Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Michael Poli, Sangdoo Yun
This phenomenon, also known as shortcut learning, is emerging as a key limitation of the current generation of machine learning models.
no code implementations • 29 Sep 2021 • Saehyung Lee, Hyungyu Lee, Sanghyuk Chun, Sungroh Yoon
Several recent studies have shown that the use of extra in-distribution data can lead to a high level of adversarial robustness.
no code implementations • 24 Aug 2021 • Sanghyuk Chun, Song Park
Hence, StyleAugment let the model observe abundant confounding cues for each image by on-the-fly the augmentation strategy, while the augmented images are more realistic than artistic style transferred images.
no code implementations • NeurIPS 2021 • Michael Poli, Stefano Massaroli, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, Jinkyoo Park, Animesh Garg
Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes.
4 code implementations • ICCV 2021 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
MX-Font extracts multiple style features not explicitly conditioned on component labels, but automatically by multiple experts to represent different local concepts, e. g., left-side sub-glyph.
10 code implementations • ICCV 2021 • Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh
We empirically show that such a spatial dimension reduction is beneficial to a transformer architecture as well, and propose a novel Pooling-based Vision Transformer (PiT) upon the original ViT model.
Ranked #368 on Image Classification on ImageNet
4 code implementations • NeurIPS 2021 • Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, Sungrae Park
Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains.
Ranked #25 on Domain Generalization on TerraIncognita
2 code implementations • CVPR 2021 • Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun
However, they have not fixed the training set, presumably because of a formidable annotation cost.
Ranked #20 on Image Classification on OmniBenchmark
4 code implementations • CVPR 2021 • Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus
Instead, we propose to use Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space.
3 code implementations • 23 Sep 2020 • Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim
However, learning component-wise styles solely from reference glyphs is infeasible in the few-shot font generation scenario, when a target script has a large number of components, e. g., over 200 for Chinese.
2 code implementations • 8 Jul 2020 • Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata, Hyunjung Shim
In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
4 code implementations • ICLR 2021 • Byeongho Heo, Sanghyuk Chun, Seong Joon Oh, Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha
Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.
3 code implementations • ECCV 2020 • Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, Hwalsuk Lee
By utilizing the compositionality of compositional scripts, we propose a novel font generation framework, named Dual Memory-augmented Font Generation Network (DM-Font), which enables us to generate a high-quality font library with only a few samples.
no code implementations • 9 Mar 2020 • Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo
Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans.
2 code implementations • CVPR 2020 • Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim
In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
no code implementations • 11 Nov 2019 • Minz Won, Sanghyuk Chun, Xavier Serra
Recently, we proposed a self-attention based music tagging model.
Sound Audio and Speech Processing
no code implementations • 15 Oct 2019 • YoungJoon Yoo, Sanghyuk Chun, Sangdoo Yun, Jung-Woo Ha, Jaejun Yoo
We first assume that the priors of future samples can be generated in an independently and identically distributed (i. i. d.)
3 code implementations • ICML 2020 • Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh
This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias.
2 code implementations • 12 Jun 2019 • Minz Won, Sanghyuk Chun, Xavier Serra
In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.
Sound Audio and Speech Processing
30 code implementations • ICCV 2019 • Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
no code implementations • ICLR 2019 • Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim, Dongyoon Han, Jung-Woo Ha
The checkerboard phenomenon is one of the well-known visual artifacts in the computer vision field.
4 code implementations • ICCV 2019 • Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, Jung-Woo Ha
The key ingredient of our method is wavelet transforms that naturally fits in deep networks.
1 code implementation • 21 Dec 2018 • Jang-Hyun Kim, Jaejun Yoo, Sanghyuk Chun, Adrian Kim, Jung-Woo Ha
We present a hybrid framework that leverages the trade-off between temporal and frequency precision in audio representations to improve the performance of speech enhancement task.
Audio and Speech Processing Sound
no code implementations • 5 Mar 2015 • Sanghyuk Chun, Yung-Kyun Noh, Jinwoo Shin
Subspace clustering (SC) is a popular method for dimensionality reduction of high-dimensional data, where it generalizes Principal Component Analysis (PCA).