no code implementations • 5 Dec 2024 • Dayoung Gong, Suha Kwak, Minsu Cho
In this work, we tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model dubbed ActFusion.
no code implementations • 4 Nov 2024 • Dongwon Kim, Seoyeon Kim, Suha Kwak
Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning.
no code implementations • 20 Sep 2024 • Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak
Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level.
no code implementations • 5 Sep 2024 • Nayeong Kim, Juwon Kang, Sungsoo Ahn, Jungseul Ok, Suha Kwak
We study the problem of training an unbiased and accurate model given a dataset with multiple biases.
no code implementations • 11 Aug 2024 • Sungyeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak
Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions.
no code implementations • 6 Aug 2024 • Youngkil Song, Dongkeun Kim, Minsu Cho, Suha Kwak
We also propose a novel action localization method that observes the current input segment to predict the end time of the ongoing action and accesses the memory queue to estimate the start time of the action.
no code implementations • 29 Jul 2024 • Jinsung Lee, Taeoh Kim, Inwoong Lee, Minho Shim, Dongyoon Wee, Minsu Cho, Suha Kwak
Video action detection (VAD) aims to detect actors and classify their actions in a video.
no code implementations • 18 Jul 2024 • Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak
To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions.
no code implementations • CVPR 2024 • Hyeonjun Lee, Sehyun Hwang, Suha Kwak
Our work considers extreme points as a part of the true instance mask and propagates them to identify potential foreground and background points, which are all together used for training a pseudo label generator.
no code implementations • 9 May 2024 • Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park
We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality.
1 code implementation • 16 Mar 2024 • Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok
Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive.
1 code implementation • NeurIPS 2023 • Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho
Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties.
no code implementations • 5 Dec 2023 • Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
no code implementations • 16 Sep 2023 • Sungyeon Kim, Donghyun Kim, Suha Kwak
In this regard, we introduce a novel metric learning paradigm, called Universal Metric Learning (UML), which learns a unified distance metric capable of capturing relations across multiple data distributions.
1 code implementation • ICCV 2023 • Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak
Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications.
no code implementations • 2 Aug 2023 • Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh
Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues.
1 code implementation • ICCV 2023 • Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak
In a joint vision-language space, a text feature (e. g., from "a photo of a dog") could effectively represent its relevant image features (e. g., from dog photos).
Ranked #1 on Domain Generalization on DomainNet
no code implementations • 14 Jun 2023 • Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, Suha Kwak
Referring Image Segmentation (RIS) is a cross-modal task that aims to segment an instance described by a natural language expression.
1 code implementation • ICCV 2023 • Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok
Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive.
1 code implementation • CVPR 2023 • Sohyun Lee, Jaesung Rim, Boseung Jeong, GeonU Kim, Byungju Woo, Haechan Lee, Sunghyun Cho, Suha Kwak
We study human pose estimation in extremely low-light images.
no code implementations • CVPR 2023 • Sungyeon Kim, Boseung Jeong, Suha Kwak
Supervision for metric learning has long been given in the form of equivalence between human-labeled classes.
no code implementations • 15 Dec 2022 • Namyup Kim, Sehyun Hwang, Suha Kwak
This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision.
1 code implementation • CVPR 2023 • Dongwon Kim, Namyup Kim, Suha Kwak
It seeks to encode a sample into a set of different embedding vectors that capture different semantics of the sample.
1 code implementation • European Conference on Computer Vision (ECCV) 2022 • kyungmoon lee, Sungyeon Kim, Suha Kwak
Domain generalization is the task of learning models that generalize to unseen target domains.
Ranked #5 on Image to sketch recognition on PACS
no code implementations • 14 Nov 2022 • Deunsol Jung, Dahyun Kang, Suha Kwak, Minsu Cho
Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space.
no code implementations • 13 Aug 2022 • Sehyun Hwang, Sohyun Lee, Sungyeon Kim, Jungseul Ok, Suha Kwak
We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint.
1 code implementation • 22 Jun 2022 • Nayeong Kim, Sehyun Hwang, Sungsoo Ahn, Jaesik Park, Suha Kwak
We propose a new method for training debiased classifiers with no spurious attribute label.
no code implementations • CVPR 2022 • Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak
At the heart of our framework lies an algorithm that investigates contexts of data on the embedding space to predict their class-equivalence relations as pseudo labels.
no code implementations • CVPR 2022 • Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak
Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video.
1 code implementation • CVPR 2022 • Donghyeon Kwon, Suha Kwak
This paper studies semi-supervised learning of semantic segmentation, which assumes that only a small portion of training images are labeled and the others remain unlabeled.
2 code implementations • CVPR 2022 • Sohyun Lee, Taeyoung Son, Suha Kwak
Robust visual recognition under adverse weather conditions is of great importance in real-world applications.
Ranked #4 on Domain Adaptation on Cityscapes-to-FoggyDriving
1 code implementation • CVPR 2022 • Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho
The inherent challenge of detecting symmetries stems from arbitrary orientations of symmetry patterns; a reflection symmetry mirrors itself against an axis with a specific orientation while a rotation symmetry matches its rotated copy with a specific orientation.
no code implementations • CVPR 2022 • Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, Suha Kwak
Most of existing methods for this task rely heavily on convolutional neural networks, which however have trouble capturing long-range dependencies between entities in the language expression and are not flexible enough for modeling interactions between the two different modalities.
3 code implementations • CVPR 2022 • Junhyeong Cho, Youngseok Yoon, Suha Kwak
To implement this idea, we propose Collaborative Glance-Gaze TransFormer (CoFormer) that consists of two modules: Glance transformer for activity classification and Gaze transformer for entity estimation.
Ranked #2 on Situation Recognition on imSitu
no code implementations • 4 Jan 2022 • kyungmoon lee, Sungyeon Kim, Seunghoon Hong, Suha Kwak
Motivated by this, we introduce a new data augmentation approach that synthesizes novel classes and their embedding vectors.
no code implementations • CVPR 2022 • Juwon Kang, Sohyun Lee, Namyup Kim, Suha Kwak
Existing methods in this direction suppose that a domain can be characterized by styles of its images, and train a network using style-augmented data so that the network is not biased to particular style distributions.
1 code implementation • 19 Nov 2021 • Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak
Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image.
Ranked #5 on Situation Recognition on imSitu
1 code implementation • NeurIPS 2021 • Manjin Kim, Heeseung Kwon, Chunyu Wang, Suha Kwak, Minsu Cho
Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning.
Ranked #12 on Action Recognition on Diving-48
no code implementations • 29 Sep 2021 • kyungmoon lee, Sungyeon Kim, Suha Kwak
For domain generalization, the task of learning a model that generalizes to unseen target domains utilizing multiple source domains, many approaches explicitly align the distribution of the domains.
Ranked #38 on Domain Generalization on Office-Home
no code implementations • 29 Sep 2021 • Namyup Kim, Taeyoung Son, Jaehyun Pahk, Cuiling Lan, Wenjun Zeng, Suha Kwak
We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
1 code implementation • ICCV 2021 • Boseung Jeong, Jicheol Park, Suha Kwak
Attribute-based person search is the task of finding person images that are best matched with a set of text attributes given as query.
no code implementations • 5 Jul 2021 • Minkyo Seo, Yoonho Lee, Suha Kwak
This paper studies probability distributions of penultimate activations of classification networks.
2 code implementations • CVPR 2021 • Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak
Our method exploits pairwise similarities between samples in the source embedding space as the knowledge, and transfers them through a loss used for learning target embedding models.
1 code implementation • ICCV 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
With a sufficient volume of the neighborhood in space and time, it effectively captures long-term interaction and fast motion in the video, leading to robust action recognition.
Ranked #19 on Action Recognition on Something-Something V1 (using extra training data)
1 code implementation • 1 Jan 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
We leverage the whole volume of STSS and let our model learn to extract an effective motion representation from it.
no code implementations • 1 Jan 2021 • Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak
To this end, we design a new loss called smooth contrastive loss, which pulls together or pushes apart a pair of samples in a target embedding space with strength determined by their semantic similarity in the source embedding space; an analysis of the loss reveals that this property enables more important pairs to contribute more to learning the target embedding space.
2 code implementations • ECCV 2020 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.
Ranked #1 on Video Classification on Something-Something V2
1 code implementation • 17 Jul 2020 • Taeyoung Son, Juwon Kang, Namyup Kim, Sunghyun Cho, Suha Kwak
Despite the great advances in visual recognition, it has been witnessed that recognition models trained on clean images of common datasets are not robust against distorted images in the real world.
3 code implementations • CVPR 2020 • Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak
The former class can leverage fine-grained semantic relations between data points, but slows convergence in general due to its high training complexity.
Ranked #10 on Metric Learning on CUB-200-2011 (using extra training data)
Fine-Grained Image Classification Fine-Grained Vehicle Classification +1
1 code implementation • CVPR 2019 • Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, Bohyung Han
In the first stage, we estimate pseudo-labels for the examples in the target domain using an external unsupervised domain adaptation algorithm---for example, MSTN or CPUA---integrating the proposed domain-specific batch normalization.
1 code implementation • CVPR 2019 • Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak
Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not.
no code implementations • 15 Apr 2019 • Seungkwan Lee, Suha Kwak, Minsu Cho
Bounding-box regression is a popular technique to refine or predict localization boxes in recent object detection approaches.
6 code implementations • CVPR 2019 • Jiwoon Ahn, Sunghyun Cho, Suha Kwak
For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries.
Image Classification Image-level Supervised Instance Segmentation +2
2 code implementations • CVPR 2018 • Jiwoon Ahn, Suha Kwak
To alleviate this issue, we present a novel framework that generates segmentation labels of images given their image-level class labels.
no code implementations • CVPR 2017 • Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han
Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web repository, and generating segmentation labels from the retrieved videos to simulate strong supervision for semantic segmentation.
no code implementations • CVPR 2016 • Suha Kwak, Minsu Cho, Ivan Laptev
We address the problem of learning a pose-aware, compact embedding that projects images with similar human poses to be placed close-by in the embedding space.
no code implementations • ICCV 2015 • Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid
This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision.
no code implementations • 24 Feb 2015 • Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han
We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN).
no code implementations • CVPR 2015 • Minsu Cho, Suha Kwak, Cordelia Schmid, Jean Ponce
This paper addresses unsupervised discovery and localization of dominant objects from a noisy image collection with multiple object classes.
no code implementations • NeurIPS 2014 • Jan Feyereisl, Suha Kwak, Jeany Son, Bohyung Han
We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information.
no code implementations • CVPR 2013 • Suha Kwak, Bohyung Han, Joon Hee Han
We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario.