Search Results for author: Sangmin Woo

Found 14 papers, 9 papers with code

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

no code implementations21 Mar 2024 Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes.

Activity Recognition

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

1 code implementation14 Mar 2024 Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation.

Denoising Multi-Task Learning

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

1 code implementation26 Dec 2023 Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim

This work introduces HarmonyView, a simple yet effective diffusion sampling technique adept at decomposing two intricate aspects in single-image 3D generation: consistency and diversity.

3D Generation Image to 3D

Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

no code implementations21 Nov 2023 Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

CFEM incorporates sepearte learnable query embeddings for each modality, which guide CFEM to extract complementary information and global action content from the other modalities.

Action Recognition

Denoising Task Routing for Diffusion Models

2 code implementations11 Oct 2023 Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL).

Denoising Multi-Task Learning

Audio-Visual Glance Network for Efficient Video Recognition

no code implementations ICCV 2023 Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.

Video Recognition Video Understanding

Sketch-based Video Object Localization

1 code implementation2 Apr 2023 Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch.

Object Object Localization +1

Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

no code implementations31 Aug 2022 JeongSoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim

Camera traps, unmanned observation devices, and deep learning-based image recognition systems have greatly reduced human effort in collecting and analyzing wildlife images.

Optical Flow Estimation

Modality Mixer for Multi-modal Action Recognition

no code implementations24 Aug 2022 Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content.

Action Recognition

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

1 code implementation25 Jan 2022 Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

To our surprise, we found that training schedule shows divide-and-conquer-like pattern: time segments are first diversified regardless of the target, then coupled with each target, and fine-tuned to the target again.

Natural Language Queries Sentence +2

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection

1 code implementation15 Jul 2021 Sangmin Woo, Junhyug Noh, Kangil Kim

TSPN tells when to look: it simultaneously predicts start-end timestamps (i. e., temporal spans) and categories of the all possible relations by utilizing full video context.

Relation Video Visual Relation Detection +1

Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

1 code implementation16 Jun 2021 Sangmin Woo, Junhyug Noh, Kangil Kim

To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed.

 Ranked #1 on Scene Graph Classification on Visual Genome (Recall@20 metric)

Bidirectional Relationship Classification Graph Generation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.