no code implementations • 24 Feb 2025 • Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang, Haozhu Wang, Han Ding, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen, Haibo Ding, Panpan Xu, Lin Lee Cheong
Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks.
no code implementations • 21 Nov 2024 • Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored.
no code implementations • 28 May 2024 • Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim
We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters.
no code implementations • 28 May 2024 • Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim
By integrating the probability distributions from both the original and transformed images, RITUAL effectively reduces hallucinations.
no code implementations • 28 May 2024 • Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim
This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects.
no code implementations • 28 May 2024 • Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim
The first pathway of the relation module, the actor-centric path, initially captures the temporal dynamics of individual actors and then constructs inter-actor relationships.
no code implementations • 21 Mar 2024 • Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim
Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes.
1 code implementation • 14 Mar 2024 • Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim
To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation.
1 code implementation • CVPR 2024 • Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim
This work introduces HarmonyView, a simple yet effective diffusion sampling technique adept at decomposing two intricate aspects in single-image 3D generation: consistency and diversity.
no code implementations • 21 Nov 2023 • Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim
CFEM incorporates sepearte learnable query embeddings for each modality, which guide CFEM to extract complementary information and global action content from the other modalities.
2 code implementations • 11 Oct 2023 • Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL).
no code implementations • ICCV 2023 • Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim
To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.
1 code implementation • 2 Apr 2023 • Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim
We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch.
1 code implementation • 25 Nov 2022 • Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim
We ask: how can we train a model that is robust to missing modalities?
no code implementations • 31 Aug 2022 • JeongSoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim
Camera traps, unmanned observation devices, and deep learning-based image recognition systems have greatly reduced human effort in collecting and analyzing wildlife images.
no code implementations • 24 Aug 2022 • Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim
In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content.
1 code implementation • Applied Sciences 2022 • HeeSeung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, SangKeun Jung, Sangmin Woo
Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models.
1 code implementation • 25 Jan 2022 • Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim
To our surprise, we found that training schedule shows divide-and-conquer-like pattern: time segments are first diversified regardless of the target, then coupled with each target, and fine-tuned to the target again.
1 code implementation • 15 Jul 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim
TSPN tells when to look: it simultaneously predicts start-end timestamps (i. e., temporal spans) and categories of the all possible relations by utilizing full video context.
Ranked #2 on
Video Visual Relation Detection
on ImageNet-VidVRD
1 code implementation • 16 Jun 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim
To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed.
Ranked #1 on
Predicate Classification
on Visual Genome
(R@100 metric)