Search Results for author: Kwanghoon Sohn

Found 55 papers, 18 papers with code

Guided Semantic Flow

no code implementations ECCV 2020 Sangryul Jeon, Dongbo Min, Seungryong Kim, Jihwan Choe, Kwanghoon Sohn

Establishing dense semantic correspondences requires dealing with large geometric variations caused by the unconstrained setting of images.

Semantic correspondence

Bridging Vision and Language Spaces with Assignment Prediction

1 code implementation15 Apr 2024 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.

Cross-Modal Retrieval Image Captioning +3

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

1 code implementation1 Apr 2024 Hyeongjun Kwon, Jinhyun Jang, Jin Kim, Kwonyoung Kim, Kwanghoon Sohn

Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.

Image Classification Scene Understanding

Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation

1 code implementation10 Nov 2023 Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn

Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment.

Test-time Adaptation

Semantic-aware Network for Aerial-to-Ground Image Synthesis

1 code implementation14 Aug 2023 Jinhyun Jang, Taeyong Song, Kwanghoon Sohn

Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image.

Image Generation

Knowing Where to Focus: Event-aware Transformer for Video Grounding

1 code implementation ICCV 2023 Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn

Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.

Moment Queries Sentence +1

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

1 code implementation ICCV 2023 Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.

Attribute Compositional Zero-Shot Learning +1

PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification

no code implementations CVPR 2023 Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn

Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.

Contrastive Learning Data Augmentation +1

Probabilistic Prompt Learning for Dense Prediction

no code implementations CVPR 2023 Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn

Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models.

Attribute Text Matching

Dual-path Adaptation from Image to Video Transformers

1 code implementation CVPR 2023 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

Action Classification Action Recognition In Videos +2

TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization

1 code implementation16 Mar 2023 Tuan N. Tang, Kwonyoung Kim, Kwanghoon Sohn

To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block.

Temporal Action Localization Video Understanding

Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning

no code implementations CVPR 2023 Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min

Recent vision-based reinforcement learning (RL) methods have found extracting high-level features from raw pixels with self-supervised learning to be effective in learning policies.

Atari Games reinforcement-learning +3

SimOn: A Simple Framework for Online Temporal Action Localization

1 code implementation8 Nov 2022 Tuan N. Tang, Jungin Park, Kwonyoung Kim, Kwanghoon Sohn

In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.

Temporal Action Localization

Language-free Training for Zero-shot Video Grounding

no code implementations24 Oct 2022 Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.

Video Grounding

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

no code implementations27 Jul 2022 Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.

Autonomous Driving Meta-Learning

Probabilistic Representations for Video Contrastive Learning

no code implementations CVPR 2022 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.

Action Recognition Contrastive Learning +3

Context-Preserving Instance-Level Augmentation and Deformable Convolution Networks for SAR Ship Detection

no code implementations14 Feb 2022 Taeyong Song, Sunok Kim, SungTai Kim, Jaeseok Lee, Kwanghoon Sohn

By learning sampling offset to the grid of standard convolution, the network can robustly extract the features from targets with shape variations for SAR ship detection.

Data Augmentation Instance Segmentation +2

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution

no code implementations6 Feb 2022 Somi Jeong, Jiyoung Lee, Kwanghoon Sohn

We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.

Disentanglement Translation +1

Memory-guided Image De-raining Using Time-Lapse Data

no code implementations6 Jan 2022 Jaehoon Cho, Seungryong Kim, Kwanghoon Sohn

To address this problem, we propose a novel network architecture based on a memory network that explicitly helps to capture long-term rain streak information in the time-lapse data.

KNN Local Attention for Image Restoration

no code implementations CVPR 2022 Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, Dongbo Min

In this way, the pair-wise operation establishes non-local connectivity while maintaining the desired properties of the local attention, i. e., inductive bias of locality and linear complexity to input resolution.

Deblurring Image Denoising +3

Dual Prototypical Contrastive Learning for Few-shot Semantic Segmentation

no code implementations9 Nov 2021 Hyeongjun Kwon, Somi Jeong, Sunok Kim, Kwanghoon Sohn

We address the problem of few-shot semantic segmentation (FSS), which aims to segment novel class objects in a target image with a few annotated samples.

Contrastive Learning Few-Shot Semantic Segmentation +2

DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes

no code implementations22 Oct 2021 Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

This manual is intended to provide a detailed description of the DIML/CVL RGB-D dataset.

Wide and Narrow: Video Prediction from Context and Motion

no code implementations22 Oct 2021 Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn

Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.

Video Prediction

Weakly-Supervised Learning of Disentangled and Interpretable Skills for Hierarchical Reinforcement Learning

no code implementations29 Sep 2021 Wonil Song, Sangryul Jeon, Hyesong Choi, Kwanghoon Sohn, Dongbo Min

Given the latent representations as skills, a skill-based policy network is trained to generate similar trajectories to the learned decoder of the trajectory VAE.

Hierarchical Reinforcement Learning Inductive Bias +3

Self-Supervised Structured Representations for Deep Reinforcement Learning

no code implementations29 Sep 2021 Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min

The proposed method imposes similarity constraints on the three latent volumes; warped query representations by estimated flows, predicted target representations from the transition model, and target representations of future state.

Atari Games Image Reconstruction +3

Self-balanced Learning For Domain Generalization

no code implementations31 Aug 2021 Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.

Domain Generalization

Learning Canonical 3D Object Representation for Fine-Grained Recognition

no code implementations ICCV 2021 Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn

By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object and achieves competitive performance on fine-grained image recognition and vehicle re-identification.

3D Shape Reconstruction Fine-Grained Image Recognition +3

Prototype-Guided Saliency Feature Learning for Person Search

no code implementations CVPR 2021 Hanjae Kim, Sunghun Joung, Ig-Jae Kim, Kwanghoon Sohn

Existing person search methods integrate person detection and re-identification (re-ID) module into a unified system.

Human Detection Person Search

CATs: Cost Aggregation Transformers for Visual Correspondence

1 code implementation NeurIPS 2021 Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations.

Semantic correspondence

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

no code implementations CVPR 2021 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.

Question Answering Video Question Answering

On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

1 code implementation2 Jan 2021 Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia

Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images.

Stereo Matching

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

1 code implementation15 Dec 2020 Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn

Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.

Clustering Domain Adaptation +2

Adaptive confidence thresholding for monocular depth estimation

1 code implementation ICCV 2021 Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Dongbo Min

To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps.

Monocular Depth Estimation Stereo Matching

SumGraph: Video Summarization via Recursive Graph Modeling

no code implementations ECCV 2020 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.

Video Summarization

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

no code implementations CVPR 2020 Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn

To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.

Object object-detection +2

Joint Learning of Semantic Alignment and Object Landmark Detection

no code implementations ICCV 2019 Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn

Based on the key insight that the two tasks can mutually provide supervisions to each other, our networks accomplish this through a joint loss function that alternatively imposes a consistency constraint between the two tasks, thereby boosting the performance and addressing the lack of training data in a principled manner.

Object

Context-Aware Emotion Recognition Networks

1 code implementation ICCV 2019 Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn

We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.

Emotion Classification Emotion Recognition in Context

A Large RGB-D Dataset for Semi-supervised Monocular Depth Estimation

no code implementations23 Apr 2019 Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs.

Monocular Depth Estimation Semantic Segmentation

Semantic Attribute Matching Networks

no code implementations CVPR 2019 Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn

SAM-Net accomplishes this through an iterative process of establishing reliable correspondences by reducing the attribute discrepancy between the images and synthesizing attribute transferred images using the learned correspondences.

Attribute

Recurrent Transformer Networks for Semantic Correspondence

1 code implementation NeurIPS 2018 Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn

Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations.

General Classification Semantic correspondence

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

no code implementations ECCV 2018 Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn

To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks.

regression Semantic correspondence

DCTM: Discrete-Continuous Transformation Matching for Semantic Flow

no code implementations ICCV 2017 Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor.

Semantic correspondence

FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence

1 code implementation CVPR 2017 Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn

The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner.

Object Semantic correspondence +1

Deeply Aggregated Alternating Minimization for Image Restoration

no code implementations CVPR 2017 Youngjung Kim, Hyungjoo Jung, Dongbo Min, Kwanghoon Sohn

The proposed framework enables the convolutional neural networks (CNNs) to operate as a prior or regularizer in the AM algorithm.

Image Denoising Image Restoration +1

DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation

no code implementations27 Apr 2016 Seungryong Kim, Dongbo Min, Bumsub Ham, Minh N. Do, Kwanghoon Sohn

In this paper, we propose a novel dense descriptor, called dense adaptive self-correlation (DASC), to estimate multi-modal and multi-spectral dense correspondences.

Efficient Splitting-based Method for Global Image Smoothing

no code implementations26 Apr 2016 Youngjung Kim, Dongbo Min, Bumsub Ham, Kwanghoon Sohn

In this paper, we introduce a highly efficient splitting-based method for global EPS that minimizes the objective function of ${l_2}$ data and prior terms (possibly non-smooth and non-convex) in linear time.

image smoothing

Deep Self-Convolutional Activations Descriptor for Dense Cross-Modal Correspondence

no code implementations21 Mar 2016 Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

We present a novel descriptor, called deep self-convolutional activations (DeSCA), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions.

DASC: Dense Adaptive Self-Correlation Descriptor for Multi-Modal and Multi-Spectral Correspondence

no code implementations CVPR 2015 Seungryong Kim, Dongbo Min, Bumsub Ham, Seungchul Ryu, Minh N. Do, Kwanghoon Sohn

To further improve the matching quality and runtime efficiency, we propose a patch-wise receptive field pooling, in which a sampling pattern is optimized with a discriminative learning.

Optical Flow Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.