Search Results for author: Joonmyung Choi

Found 7 papers, 6 papers with code

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

1 code implementation CVPR 2024 Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training.

Ranked #2 on Video Retrieval on SSv2-template retrieval (using extra training data)

Action Recognition Computational Efficiency +5

Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification

no code implementations23 Aug 2023 Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim

However, those methods do not consider whether a concept is visually relevant or not, which is an important factor in computing meaningful concept scores.

Image Classification Medical Image Classification

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

1 code implementation CVPR 2023 Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.

Auxiliary Learning Multimodal Sentiment Analysis +10

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

1 code implementation14 Oct 2022 Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim

To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens.

Data Augmentation Image Classification

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

1 code implementation CVPR 2022 Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim

In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).

Contrastive Learning Dynamic Time Warping +1

Cannot find the paper you are looking for? You can Submit a new open access paper.