Search Results for author: Sangmin Woo

Found 14 papers, 9 papers with code

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

no code implementations • 21 Mar 2024 • Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes.

Activity Recognition

Paper
Add Code

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

1 code implementation • 14 Mar 2024 • Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation.

Denoising Multi-Task Learning

Paper
Code

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

1 code implementation • 26 Dec 2023 • Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim

This work introduces HarmonyView, a simple yet effective diffusion sampling technique adept at decomposing two intricate aspects in single-image 3D generation: consistency and diversity.

3D Generation Image to 3D

108

Paper
Code

Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

no code implementations • 21 Nov 2023 • Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

CFEM incorporates sepearte learnable query embeddings for each modality, which guide CFEM to extract complementary information and global action content from the other modalities.

Action Recognition

Paper
Add Code

Denoising Task Routing for Diffusion Models

2 code implementations • 11 Oct 2023 • Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL).

Denoising Multi-Task Learning

Paper
Code

Audio-Visual Glance Network for Efficient Video Recognition

no code implementations • ICCV 2023 • Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.

Video Recognition Video Understanding

Paper
Add Code

Sketch-based Video Object Localization

1 code implementation • 2 Apr 2023 • Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch.

Object Object Localization +1

Paper
Code

Towards Good Practices for Missing Modality Robust Action Recognition

1 code implementation • 25 Nov 2022 • Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

We ask: how can we train a model that is robust to missing modalities?

Action Recognition Data Augmentation

Paper
Code

Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

no code implementations • 31 Aug 2022 • JeongSoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim

Camera traps, unmanned observation devices, and deep learning-based image recognition systems have greatly reduced human effort in collecting and analyzing wildlife images.

Optical Flow Estimation

Paper
Add Code

Modality Mixer for Multi-modal Action Recognition

no code implementations • 24 Aug 2022 • Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content.

Action Recognition

Paper
Add Code

Impact of representation matching with neural machine translation

1 code implementation • Applied Sciences 2022 • HeeSeung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, SangKeun Jung, Sangmin Woo

Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models.

Language Modelling Machine Translation +1

Paper
Code

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

1 code implementation • 25 Jan 2022 • Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

To our surprise, we found that training schedule shows divide-and-conquer-like pattern: time segments are first diversified regardless of the target, then coupled with each target, and fine-tuned to the target again.

Natural Language Queries Sentence +2

Paper
Code

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection

1 code implementation • 15 Jul 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim

TSPN tells when to look: it simultaneously predicts start-end timestamps (i. e., temporal spans) and categories of the all possible relations by utilizing full video context.

Ranked #2 on Video Visual Relation Detection on ImageNet-VidVRD

Relation Video Visual Relation Detection +1

Paper
Code

Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

1 code implementation • 16 Jun 2021 • Sangmin Woo, Junhyug Noh, Kangil Kim

To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed.

Ranked #1 on Scene Graph Classification on Visual Genome (Recall@20 metric)

Bidirectional Relationship Classification Graph Generation +4

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.