Search Results for author: Wonjae Kim

Found 19 papers, 11 papers with code

Language-only Efficient Training of Zero-shot Composed Image Retrieval

1 code implementation4 Dec 2023 Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun

Our LinCIR (Language-only training for CIR) can be trained only with text datasets by a novel self-supervision named self-masking projection (SMP).

Image Retrieval Retrieval +1

STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

no code implementations12 Oct 2023 Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang

Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world.

Continual Learning Representation Learning +1

Computational Approaches for App-to-App Retrieval and Design Consistency Check

no code implementations19 Sep 2023 SeokHyeon Park, Wonjae Kim, Young-Ho Kim, Jinwook Seo

Extracting semantic representations from mobile user interfaces (UI) and using the representations for designers' decision-making processes have shown the potential to be effective computational design support tools.

Decision Making Retrieval

What Do Self-Supervised Vision Transformers Learn?

1 code implementation1 May 2023 Namuk Park, Wonjae Kim, Byeongho Heo, Taekyung Kim, Sangdoo Yun

We present a comparative study on how and why contrastive learning (CL) and masked image modeling (MIM) differ in their representations and in their performance of downstream tasks.

Contrastive Learning

SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage

1 code implementation ICCV 2023 Song Park, Sanghyuk Chun, Byeongho Heo, Wonjae Kim, Sangdoo Yun

We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e. g., the LAION-4B dataset needs 240TB storage space).

Continual Learning

UniXGen: A Unified Vision-Language Model for Multi-View Chest X-ray Generation and Report Generation

1 code implementation23 Feb 2023 Hyungyung Lee, Da Young Lee, Wonjae Kim, Jin-Hwa Kim, Tackeun Kim, Jihang Kim, Leonard Sunwoo, Edward Choi

We also find that view-specific special tokens can distinguish between different views and properly generate specific views even if they do not exist in the dataset, and utilizing multi-view chest X-rays can faithfully capture the abnormal findings in the additional X-rays.

Language Modelling Quantization

Group Generalized Mean Pooling for Vision Transformer

no code implementations8 Dec 2022 Byungsoo Ko, Han-Gyu Kim, Byeongho Heo, Sangdoo Yun, Sanghyuk Chun, Geonmo Gu, Wonjae Kim

As ViT groups the channels via a multi-head attention mechanism, grouping the channels by GGeM leads to lower head-wise dependence while amplifying important channels on the activation maps.

Image Retrieval Representation Learning +1

Correlation between Alignment-Uniformity and Performance of Dense Contrastive Representations

1 code implementation17 Oct 2022 Jong Hak Moon, Wonjae Kim, Edward Choi

Recently, dense contrastive learning has shown superior performance on dense prediction tasks compared to instance-level contrastive learning.

Contrastive Learning

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

5 code implementations5 Feb 2021 Wonjae Kim, Bokyung Son, Ildoo Kim

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

Cross-Modal Retrieval Image Retrieval +5

Diversified Mutual Learning for Deep Metric Learning

no code implementations9 Sep 2020 Wonpyo Park, Wonjae Kim, Kihyun You, Minsu Cho

Mutual learning is an ensemble training strategy to improve generalization by transferring individual knowledge to each other while simultaneously training multiple models.

Metric Learning Transfer Learning

Discrete Infomax Codes for Supervised Representation Learning

no code implementations28 May 2019 Yoonho Lee, Wonjae Kim, Wonpyo Park, Seungjin Choi

In this paper we present a model that produces Discrete InfoMax Codes (DIMCO); we learn a probabilistic encoder that yields k-way d-dimensional codes associated with input data.

Meta-Learning Metric Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.