Search Results for author: Suha Kwak

Found 61 papers, 26 papers with code

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

no code implementations5 Dec 2024 Dayoung Gong, Suha Kwak, Minsu Cho

In this work, we tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model dubbed ActFusion.

Action Anticipation Action Segmentation +3

Bootstrapping Top-down Information for Self-modulating Slot Attention

no code implementations4 Nov 2024 Dongwon Kim, Seoyeon Kim, Suha Kwak

Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning.

Object Object Discovery +1

PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery

no code implementations20 Sep 2024 Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak

Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level.

Person Search Retrieval +1

Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization

no code implementations5 Sep 2024 Nayeong Kim, Juwon Kang, Sungsoo Ahn, Jungseul Ok, Suha Kwak

We study the problem of training an unbiased and accurate model given a dataset with multiple biases.

Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

no code implementations11 Aug 2024 Sungyeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak

Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions.

Cross-Modal Retrieval Zero-Shot Learning

Online Temporal Action Localization with Memory-Augmented Transformer

no code implementations6 Aug 2024 Youngkil Song, Dongkeun Kim, Minsu Cho, Suha Kwak

We also propose a novel action localization method that observes the current input segment to predict the end time of the ongoing action and accesses the memory queue to estimate the start time of the action.

Temporal Action Localization

FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions

no code implementations18 Jul 2024 Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak

To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions.

Segmentation Semantic Segmentation +1

Extreme Point Supervised Instance Segmentation

no code implementations CVPR 2024 Hyeonjun Lee, Sehyun Hwang, Suha Kwak

Our work considers extreme points as a part of the true instance mask and propagates them to identify potential foreground and background points, which are all together used for training a pseudo label generator.

Point-Supervised Instance Segmentation Pseudo Label +2

Distilling Diffusion Models into Conditional GANs

no code implementations9 May 2024 Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality.

Image-to-Image Translation

Active Label Correction for Semantic Segmentation with Foundation Models

1 code implementation16 Mar 2024 Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok

Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive.

Semantic Segmentation Superpixels

Activity Grammars for Temporal Action Segmentation

1 code implementation NeurIPS 2023 Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties.

Action Segmentation Segmentation +1

Towards More Practical Group Activity Detection: A New Benchmark and Model

no code implementations5 Dec 2023 Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak

Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.

Action Detection Activity Detection

Universal Metric Learning with Parameter-Efficient Transfer Learning

no code implementations16 Sep 2023 Sungyeon Kim, Donghyun Kim, Suha Kwak

In this regard, we introduce a novel metric learning paradigm, called Universal Metric Learning (UML), which learns a unified distance metric capable of capturing relations across multiple data distributions.

Metric Learning Transfer Learning

Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

1 code implementation ICCV 2023 Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications.

Image Segmentation Segmentation +2

SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems

no code implementations2 Aug 2023 Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues.

Fairness

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

1 code implementation ICCV 2023 Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak

In a joint vision-language space, a text feature (e. g., from "a photo of a dog") could effectively represent its relevant image features (e. g., from dog photos).

Image Classification Multi-modal Classification +5

Extending CLIP's Image-Text Alignment to Referring Image Segmentation

no code implementations14 Jun 2023 Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, Suha Kwak

Referring Image Segmentation (RIS) is a cross-modal task that aims to segment an instance described by a natural language expression.

Image Segmentation Referring Expression Segmentation +2

HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

no code implementations CVPR 2023 Sungyeon Kim, Boseung Jeong, Suha Kwak

Supervision for metric learning has long been given in the form of equivalence between human-labeled classes.

Metric Learning

Learning to Detect Semantic Boundaries with Image-level Class Labels

no code implementations15 Dec 2022 Namyup Kim, Sehyun Hwang, Suha Kwak

This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision.

Boundary Detection Image Classification +1

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

1 code implementation CVPR 2023 Dongwon Kim, Namyup Kim, Suha Kwak

It seeks to encode a sample into a set of different embedding vectors that capture different semantics of the sample.

Cross-Modal Retrieval Retrieval

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

no code implementations14 Nov 2022 Deunsol Jung, Dahyun Kang, Suha Kwak, Minsu Cho

Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space.

Image Retrieval Meta-Learning +2

Combating Label Distribution Shift for Active Domain Adaptation

no code implementations13 Aug 2022 Sehyun Hwang, Sohyun Lee, Sungyeon Kim, Jungseul Ok, Suha Kwak

We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint.

Domain Adaptation

Learning Debiased Classifier with Biased Committee

1 code implementation22 Jun 2022 Nayeong Kim, Sehyun Hwang, Sungsoo Ahn, Jaesik Park, Suha Kwak

We propose a new method for training debiased classifiers with no spurious attribute label.

Attribute

Self-Taught Metric Learning without Labels

no code implementations CVPR 2022 Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

At the heart of our framework lies an algorithm that investigates contexts of data on the embedding space to predict their class-equivalence relations as pseudo labels.

Metric Learning

Detector-Free Weakly Supervised Group Activity Recognition

no code implementations CVPR 2022 Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak

Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video.

Group Activity Recognition

Semi-supervised Semantic Segmentation with Error Localization Network

1 code implementation CVPR 2022 Donghyeon Kwon, Suha Kwak

This paper studies semi-supervised learning of semantic segmentation, which assumes that only a small portion of training images are labeled and the others remain unlabeled.

Contrastive Learning Segmentation +1

Reflection and Rotation Symmetry Detection via Equivariant Learning

1 code implementation CVPR 2022 Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho

The inherent challenge of detecting symmetries stems from arbitrary orientations of symmetry patterns; a reflection symmetry mirrors itself against an axis with a specific orientation while a rotation symmetry matches its rotated copy with a specific orientation.

Symmetry Detection

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

no code implementations CVPR 2022 Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, Suha Kwak

Most of existing methods for this task rely heavily on convolutional neural networks, which however have trouble capturing long-range dependencies between entities in the language expression and are not flexible enough for modeling interactions between the two different modalities.

Image Segmentation Referring Expression Segmentation +2

Collaborative Transformers for Grounded Situation Recognition

3 code implementations CVPR 2022 Junhyeong Cho, Youngseok Yoon, Suha Kwak

To implement this idea, we propose Collaborative Glance-Gaze TransFormer (CoFormer) that consists of two modules: Glance transformer for activity classification and Gaze transformer for entity estimation.

Grounded Situation Recognition Image Classification +4

Learning to Generate Novel Classes for Deep Metric Learning

no code implementations4 Jan 2022 kyungmoon lee, Sungyeon Kim, Seunghoon Hong, Suha Kwak

Motivated by this, we introduce a new data augmentation approach that synthesizes novel classes and their embedding vectors.

Data Augmentation Metric Learning

Style Neophile: Constantly Seeking Novel Styles for Domain Generalization

no code implementations CVPR 2022 Juwon Kang, Sohyun Lee, Namyup Kim, Suha Kwak

Existing methods in this direction suppose that a domain can be characterized by styles of its images, and train a network using style-augmented data so that the network is not biased to particular style distributions.

Domain Generalization Representation Learning

Grounded Situation Recognition with Transformers

1 code implementation19 Nov 2021 Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image.

Decoder Grounded Situation Recognition +5

Cross Domain Ensemble Distillation for Domain Generalization

no code implementations29 Sep 2021 kyungmoon lee, Sungyeon Kim, Suha Kwak

For domain generalization, the task of learning a model that generalizes to unseen target domains utilizing multiple source domains, many approaches explicitly align the distribution of the domains.

Domain Generalization Image Classification

WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation

no code implementations29 Sep 2021 Namyup Kim, Taeyoung Son, Jaehyun Pahk, Cuiling Lan, Wenjun Zeng, Suha Kwak

We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.

Diversity Domain Generalization +2

ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer

1 code implementation ICCV 2021 Boseung Jeong, Jicheol Park, Suha Kwak

Attribute-based person search is the task of finding person images that are best matched with a set of text attributes given as query.

Attribute Person Search

Embedding Transfer with Label Relaxation for Improved Metric Learning

2 code implementations CVPR 2021 Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

Our method exploits pairwise similarities between samples in the source embedding space as the knowledge, and transfers them through a loss used for learning target embedding models.

Knowledge Distillation Metric Learning

Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition

1 code implementation ICCV 2021 Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho

With a sufficient volume of the neighborhood in space and time, it effectively captures long-term interaction and fast motion in the video, leading to robust action recognition.

Ranked #19 on Action Recognition on Something-Something V1 (using extra training data)

Action Recognition Temporal Action Localization +1

Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition

1 code implementation1 Jan 2021 Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho

We leverage the whole volume of STSS and let our model learn to extract an effective motion representation from it.

Action Recognition Video Understanding

Embedding Transfer via Smooth Contrastive Loss

no code implementations1 Jan 2021 Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

To this end, we design a new loss called smooth contrastive loss, which pulls together or pushes apart a pair of samples in a target embedding space with strength determined by their semantic similarity in the source embedding space; an analysis of the loss reveals that this property enables more important pairs to contribute more to learning the target embedding space.

Metric Learning Semantic Similarity +1

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

2 code implementations ECCV 2020 Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho

As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.

Action Classification Action Recognition +2

URIE: Universal Image Enhancement for Visual Recognition in the Wild

1 code implementation17 Jul 2020 Taeyoung Son, Juwon Kang, Namyup Kim, Sunghyun Cho, Suha Kwak

Despite the great advances in visual recognition, it has been witnessed that recognition models trained on clean images of common datasets are not robust against distorted images in the real world.

Image Enhancement

Proxy Anchor Loss for Deep Metric Learning

3 code implementations CVPR 2020 Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

The former class can leverage fine-grained semantic relations between data points, but slows convergence in general due to its high training complexity.

Ranked #10 on Metric Learning on CUB-200-2011 (using extra training data)

Fine-Grained Image Classification Fine-Grained Vehicle Classification +1

Domain-Specific Batch Normalization for Unsupervised Domain Adaptation

1 code implementation CVPR 2019 Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, Bohyung Han

In the first stage, we estimate pseudo-labels for the examples in the target domain using an external unsupervised domain adaptation algorithm---for example, MSTN or CPUA---integrating the proposed domain-specific batch normalization.

Unsupervised Domain Adaptation

Deep Metric Learning Beyond Binary Supervision

1 code implementation CVPR 2019 Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak

Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not.

Image Captioning Image Retrieval +5

Universal Bounding Box Regression and Its Applications

no code implementations15 Apr 2019 Seungkwan Lee, Suha Kwak, Minsu Cho

Bounding-box regression is a popular technique to refine or predict localization boxes in recent object detection approaches.

Object object-detection +3

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

6 code implementations CVPR 2019 Jiwoon Ahn, Sunghyun Cho, Suha Kwak

For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries.

Image Classification Image-level Supervised Instance Segmentation +2

Weakly Supervised Semantic Segmentation using Web-Crawled Videos

no code implementations CVPR 2017 Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han

Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web repository, and generating segmentation labels from the retrieved videos to simulate strong supervision for semantic segmentation.

Image Classification Segmentation +2

Thin-Slicing for Pose: Learning to Understand Pose Without Explicit Pose Estimation

no code implementations CVPR 2016 Suha Kwak, Minsu Cho, Ivan Laptev

We address the problem of learning a pose-aware, compact embedding that projects images with similar human poses to be placed close-by in the embedding space.

Action Recognition Image Retrieval +4

Unsupervised Object Discovery and Tracking in Video Collections

no code implementations ICCV 2015 Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid

This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision.

Object Object Discovery +1

Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network

no code implementations24 Feb 2015 Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN).

Visual Tracking

Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals

no code implementations CVPR 2015 Minsu Cho, Suha Kwak, Cordelia Schmid, Jean Ponce

This paper addresses unsupervised discovery and localization of dominant objects from a noisy image collection with multiple object classes.

Object Object Discovery

Object Localization based on Structural SVM using Privileged Information

no code implementations NeurIPS 2014 Jan Feyereisl, Suha Kwak, Jeany Son, Bohyung Han

We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information.

Object Object Localization +1

Multi-agent Event Detection: Localization and Role Assignment

no code implementations CVPR 2013 Suha Kwak, Bohyung Han, Joon Hee Han

We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario.

Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.