no code implementations • 21 Mar 2024 • Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim
Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes.
1 code implementation • 14 Mar 2024 • Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim
To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation.
1 code implementation • 26 Dec 2023 • Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim
This work introduces HarmonyView, a simple yet effective diffusion sampling technique adept at decomposing two intricate aspects in single-image 3D generation: consistency and diversity.
no code implementations • 21 Nov 2023 • Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim
CFEM incorporates sepearte learnable query embeddings for each modality, which guide CFEM to extract complementary information and global action content from the other modalities.
2 code implementations • 11 Oct 2023 • Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL).
no code implementations • ICCV 2023 • Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim
To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.
1 code implementation • 2 Apr 2023 • Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim
We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch.
1 code implementation • 25 Nov 2022 • Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim
We ask: how can we train a model that is robust to missing modalities?
no code implementations • 31 Aug 2022 • JeongSoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim
Camera traps, unmanned observation devices, and deep learning-based image recognition systems have greatly reduced human effort in collecting and analyzing wildlife images.
no code implementations • 24 Aug 2022 • Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim
In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content.
1 code implementation • Applied Sciences 2022 • HeeSeung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, SangKeun Jung, Sangmin Woo
Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models.
1 code implementation • 25 Jan 2022 • Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim
To our surprise, we found that training schedule shows divide-and-conquer-like pattern: time segments are first diversified regardless of the target, then coupled with each target, and fine-tuned to the target again.
1 code implementation • 15 Jul 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim
TSPN tells when to look: it simultaneously predicts start-end timestamps (i. e., temporal spans) and categories of the all possible relations by utilizing full video context.
Ranked #2 on Video Visual Relation Detection on ImageNet-VidVRD
1 code implementation • 16 Jun 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim
To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed.
Ranked #1 on Scene Graph Classification on Visual Genome (Recall@20 metric)
Bidirectional Relationship Classification Graph Generation +4