no code implementations • ECCV 2020 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Jihwan Choe, Kwanghoon Sohn
Establishing dense semantic correspondences requires dealing with large geometric variations caused by the unconstrained setting of images.
no code implementations • 13 Oct 2024 • Eungbean Lee, Somi Jeong, Kwanghoon Sohn
Our method formulates the task as a stochastic Brownian bridge process, a diffusion process with a fixed initial point as structure control and translates into the corresponding photo-realistic image while being conditioned solely on the given exemplar image.
1 code implementation • 18 Jul 2024 • Ilhoon Yoon, Hyeongjun Kwon, Jin Kim, Junyoung Park, Hyunsung Jang, Kwanghoon Sohn
Extensive experiments demonstrate that our LPLD loss leads to effective adaptation by reducing false negatives and facilitating the use of domain-invariant knowledge from the source model.
no code implementations • 31 May 2024 • Seongheon Park, Hyuk Kwon, Kwanghoon Sohn, Kibok Lee
Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets.
1 code implementation • CVPR 2024 • Jihyun Kim, Changjae Oh, Hoseok Do, Soohyun Kim, Kwanghoon Sohn
To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs.
1 code implementation • 15 Apr 2024 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.
1 code implementation • CVPR 2024 • Hyeongjun Kwon, Jinhyun Jang, Jin Kim, Kwonyoung Kim, Kwanghoon Sohn
Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.
1 code implementation • 10 Nov 2023 • Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn
Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment.
1 code implementation • 14 Aug 2023 • Jinhyun Jang, Taeyong Song, Kwanghoon Sohn
Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image.
1 code implementation • ICCV 2023 • Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.
1 code implementation • ICCV 2023 • Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn
Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.
no code implementations • CVPR 2023 • Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn
Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.
no code implementations • CVPR 2023 • Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn
Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models.
1 code implementation • CVPR 2023 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.
Ranked #1 on Action Classification on Diving-48
1 code implementation • 16 Mar 2023 • Tuan N. Tang, Kwonyoung Kim, Kwanghoon Sohn
To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block.
Ranked #1 on Temporal Action Localization on MUSES
no code implementations • CVPR 2023 • Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min
Recent vision-based reinforcement learning (RL) methods have found extracting high-level features from raw pixels with self-supervised learning to be effective in learning policies.
no code implementations • CVPR 2023 • Taeyong Song, Sunok Kim, Kwanghoon Sohn
In this paper, we present a novel spatially-adaptive self-similarity (SASS) for unsupervised asymmetric stereo matching.
1 code implementation • 8 Nov 2022 • Tuan N. Tang, Jungin Park, Kwonyoung Kim, Kwanghoon Sohn
In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.
no code implementations • 24 Oct 2022 • Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn
Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.
1 code implementation • 27 Jul 2022 • Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn
To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.
no code implementations • CVPR 2022 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.
1 code implementation • CVPR 2022 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
The rise of deep neural networks has led to several breakthroughs for semantic segmentation.
no code implementations • 14 Feb 2022 • Taeyong Song, Sunok Kim, SungTai Kim, Jaeseok Lee, Kwanghoon Sohn
By learning sampling offset to the grid of standard convolution, the network can robustly extract the features from targets with shape variations for SAR ship detection.
no code implementations • 6 Feb 2022 • Somi Jeong, Jiyoung Lee, Kwanghoon Sohn
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
no code implementations • 6 Jan 2022 • Jaehoon Cho, Seungryong Kim, Kwanghoon Sohn
To address this problem, we propose a novel network architecture based on a memory network that explicitly helps to capture long-term rain streak information in the time-lapse data.
no code implementations • CVPR 2022 • Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, Dongbo Min
In this way, the pair-wise operation establishes non-local connectivity while maintaining the desired properties of the local attention, i. e., inductive bias of locality and linear complexity to input resolution.
no code implementations • 9 Nov 2021 • Hyeongjun Kwon, Somi Jeong, Sunok Kim, Kwanghoon Sohn
We address the problem of few-shot semantic segmentation (FSS), which aims to segment novel class objects in a target image with a few annotated samples.
no code implementations • 22 Oct 2021 • Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn
Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.
no code implementations • 22 Oct 2021 • Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn
This manual is intended to provide a detailed description of the DIML/CVL RGB-D dataset.
no code implementations • 29 Sep 2021 • Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min
The proposed method imposes similarity constraints on the three latent volumes; warped query representations by estimated flows, predicted target representations from the transition model, and target representations of future state.
no code implementations • 29 Sep 2021 • Wonil Song, Sangryul Jeon, Hyesong Choi, Kwanghoon Sohn, Dongbo Min
Given the latent representations as skills, a skill-based policy network is trained to generate similar trajectories to the learned decoder of the trajectory VAE.
no code implementations • 31 Aug 2021 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
no code implementations • ICCV 2021 • Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object and achieves competitive performance on fine-grained image recognition and vehicle re-identification.
no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor
The ability to perform causal and counterfactual reasoning are central properties of human intelligence.
no code implementations • CVPR 2021 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn
We present a novel framework for contrastive learning of pixel-level representation using only unlabeled video.
no code implementations • CVPR 2021 • Hanjae Kim, Sunghun Joung, Ig-Jae Kim, Kwanghoon Sohn
Existing person search methods integrate person detection and re-identification (re-ID) module into a unified system.
1 code implementation • NeurIPS 2021 • Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim
We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations.
Ranked #5 on Semantic correspondence on PF-WILLOW
no code implementations • CVPR 2021 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.
no code implementations • CVPR 2021 • Somi Jeong, Youngjung Kim, Eungbean Lee, Kwanghoon Sohn
We present a novel unsupervised framework for instance-level image-to-image translation.
no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn
In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.
1 code implementation • 2 Jan 2021 • Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia
Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images.
1 code implementation • 15 Dec 2020 • Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn
Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.
1 code implementation • ICCV 2021 • Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Dongbo Min
To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps.
no code implementations • ECCV 2020 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.
no code implementations • CVPR 2020 • Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn
To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.
no code implementations • ICCV 2019 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn
Based on the key insight that the two tasks can mutually provide supervisions to each other, our networks accomplish this through a joint loss function that alternatively imposes a consistency constraint between the two tasks, thereby boosting the performance and addressing the lack of training data in a principled manner.
1 code implementation • ICCV 2019 • Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn
We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.
Ranked #1 on Emotion Recognition in Context on CAER-Dynamic
no code implementations • 23 Apr 2019 • Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn
In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs.
no code implementations • CVPR 2019 • Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn
SAM-Net accomplishes this through an iterative process of establishing reliable correspondences by reducing the attribute discrepancy between the images and synthesizing attribute transferred images using the learned correspondences.
1 code implementation • NeurIPS 2018 • Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn
Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations.
no code implementations • ECCV 2018 • Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn
To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks.
no code implementations • ICCV 2017 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn
In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor.
1 code implementation • CVPR 2017 • Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn
The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner.
no code implementations • CVPR 2017 • Youngjung Kim, Hyungjoo Jung, Dongbo Min, Kwanghoon Sohn
The proposed framework enables the convolutional neural networks (CNNs) to operate as a prior or regularizer in the AM algorithm.
no code implementations • 27 Apr 2016 • Seungryong Kim, Dongbo Min, Bumsub Ham, Minh N. Do, Kwanghoon Sohn
In this paper, we propose a novel dense descriptor, called dense adaptive self-correlation (DASC), to estimate multi-modal and multi-spectral dense correspondences.
no code implementations • 26 Apr 2016 • Youngjung Kim, Dongbo Min, Bumsub Ham, Kwanghoon Sohn
In this paper, we introduce a highly efficient splitting-based method for global EPS that minimizes the objective function of ${l_2}$ data and prior terms (possibly non-smooth and non-convex) in linear time.
no code implementations • 21 Mar 2016 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn
We present a novel descriptor, called deep self-convolutional activations (DeSCA), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions.
1 code implementation • 21 Mar 2016 • Seungryong Kim, Kihong Park, Kwanghoon Sohn, Stephen Lin
We present a method for jointly predicting a depth map and intrinsic images from single-image input.
no code implementations • CVPR 2015 • Seungryong Kim, Dongbo Min, Bumsub Ham, Seungchul Ryu, Minh N. Do, Kwanghoon Sohn
To further improve the matching quality and runtime efficiency, we propose a patch-wise receptive field pooling, in which a sampling pattern is optimized with a discriminative learning.