Search Results for author: Kwanghoon Sohn

Found 55 papers, 18 papers with code

Guided Semantic Flow

no code implementations • ECCV 2020 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Jihwan Choe, Kwanghoon Sohn

Establishing dense semantic correspondences requires dealing with large geometric variations caused by the unconstrained setting of images.

Semantic correspondence

Paper
Add Code

Bridging Vision and Language Spaces with Assignment Prediction

1 code implementation • 15 Apr 2024 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.

Cross-Modal Retrieval Image Captioning +3

Paper
Code

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

1 code implementation • 1 Apr 2024 • Hyeongjun Kwon, Jinhyun Jang, Jin Kim, Kwonyoung Kim, Kwanghoon Sohn

Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.

Image Classification Scene Understanding

Paper
Code

Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation

1 code implementation • 10 Nov 2023 • Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn

Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment.

Test-time Adaptation

Paper
Code

Semantic-aware Network for Aerial-to-Ground Image Synthesis

1 code implementation • 14 Aug 2023 • Jinhyun Jang, Taeyong Song, Kwanghoon Sohn

Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image.

Image Generation

Paper
Code

Knowing Where to Focus: Event-aware Transformer for Video Grounding

1 code implementation • ICCV 2023 • Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn

Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.

Moment Queries Sentence +1

Paper
Code

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

1 code implementation • ICCV 2023 • Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.

Attribute Compositional Zero-Shot Learning +1

Paper
Code

PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification

no code implementations • CVPR 2023 • Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn

Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.

Contrastive Learning Data Augmentation +1

Paper
Add Code

Probabilistic Prompt Learning for Dense Prediction

no code implementations • CVPR 2023 • Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn

Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models.

Attribute Text Matching

Paper
Add Code

Dual-path Adaptation from Image to Video Transformers

1 code implementation • CVPR 2023 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

Ranked #1 on Action Classification on Diving-48

Action Classification Action Recognition In Videos +2

Paper
Code

TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization

1 code implementation • 16 Mar 2023 • Tuan N. Tang, Kwonyoung Kim, Kwanghoon Sohn

To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block.

Ranked #1 on Temporal Action Localization on MUSES

Temporal Action Localization Video Understanding

Paper
Code

Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning

no code implementations • CVPR 2023 • Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min

Recent vision-based reinforcement learning (RL) methods have found extracting high-level features from raw pixels with self-supervised learning to be effective in learning policies.

Atari Games reinforcement-learning +3

Paper
Add Code

Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity

no code implementations • CVPR 2023 • Taeyong Song, Sunok Kim, Kwanghoon Sohn

In this paper, we present a novel spatially-adaptive self-similarity (SASS) for unsupervised asymmetric stereo matching.

Disparity Estimation Stereo Matching

Paper
Add Code

SimOn: A Simple Framework for Online Temporal Action Localization

1 code implementation • 8 Nov 2022 • Tuan N. Tang, Jungin Park, Kwonyoung Kim, Kwanghoon Sohn

In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.

Temporal Action Localization

Paper
Code

Language-free Training for Zero-shot Video Grounding

no code implementations • 24 Oct 2022 • Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.

Video Grounding

Paper
Add Code

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

no code implementations • 27 Jul 2022 • Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.

Autonomous Driving Meta-Learning

Paper
Add Code

Probabilistic Representations for Video Contrastive Learning

no code implementations • CVPR 2022 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.

Action Recognition Contrastive Learning +3

Paper
Add Code

Pin the Memory: Learning to Generalize Semantic Segmentation

1 code implementation • CVPR 2022 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

The rise of deep neural networks has led to several breakthroughs for semantic segmentation.

Domain Generalization Meta-Learning +2

Paper
Code

Context-Preserving Instance-Level Augmentation and Deformable Convolution Networks for SAR Ship Detection

no code implementations • 14 Feb 2022 • Taeyong Song, Sunok Kim, SungTai Kim, Jaeseok Lee, Kwanghoon Sohn

By learning sampling offset to the grid of standard convolution, the network can robustly extract the features from targets with shape variations for SAR ship detection.

Data Augmentation Instance Segmentation +2

Paper
Add Code

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution

no code implementations • 6 Feb 2022 • Somi Jeong, Jiyoung Lee, Kwanghoon Sohn

We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.

Disentanglement Translation +1

Paper
Add Code

Memory-guided Image De-raining Using Time-Lapse Data

no code implementations • 6 Jan 2022 • Jaehoon Cho, Seungryong Kim, Kwanghoon Sohn

To address this problem, we propose a novel network architecture based on a memory network that explicitly helps to capture long-term rain streak information in the time-lapse data.

Paper
Add Code

KNN Local Attention for Image Restoration

no code implementations • CVPR 2022 • Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, Dongbo Min

In this way, the pair-wise operation establishes non-local connectivity while maintaining the desired properties of the local attention, i. e., inductive bias of locality and linear complexity to input resolution.

Deblurring Image Denoising +3

Paper
Add Code

Dual Prototypical Contrastive Learning for Few-shot Semantic Segmentation

no code implementations • 9 Nov 2021 • Hyeongjun Kwon, Somi Jeong, Sunok Kim, Kwanghoon Sohn

We address the problem of few-shot semantic segmentation (FSS), which aims to segment novel class objects in a target image with a few annotated samples.

Contrastive Learning Few-Shot Semantic Segmentation +2

Paper
Add Code

DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes

no code implementations • 22 Oct 2021 • Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

This manual is intended to provide a detailed description of the DIML/CVL RGB-D dataset.

Paper
Add Code

Wide and Narrow: Video Prediction from Context and Motion

no code implementations • 22 Oct 2021 • Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn

Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.

Video Prediction

Paper
Add Code

Weakly-Supervised Learning of Disentangled and Interpretable Skills for Hierarchical Reinforcement Learning

no code implementations • 29 Sep 2021 • Wonil Song, Sangryul Jeon, Hyesong Choi, Kwanghoon Sohn, Dongbo Min

Given the latent representations as skills, a skill-based policy network is trained to generate similar trajectories to the learned decoder of the trajectory VAE.

Hierarchical Reinforcement Learning Inductive Bias +3

Paper
Add Code

Self-Supervised Structured Representations for Deep Reinforcement Learning

no code implementations • 29 Sep 2021 • Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, Dongbo Min

The proposed method imposes similarity constraints on the three latent volumes; warped query representations by estimated flows, predicted target representations from the transition model, and target representations of future state.

Atari Games Image Reconstruction +3

Paper
Add Code

Self-balanced Learning For Domain Generalization

no code implementations • 31 Aug 2021 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.

Domain Generalization

Paper
Add Code

Learning Canonical 3D Object Representation for Fine-Grained Recognition

no code implementations • ICCV 2021 • Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn

By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object and achieves competitive performance on fine-grained image recognition and vehicle re-identification.

3D Shape Reconstruction Fine-Grained Image Recognition +3

Paper
Add Code

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor

The ability to perform causal and counterfactual reasoning are central properties of human intelligence.

Causal Discovery counterfactual +2

Paper
Add Code

Prototype-Guided Saliency Feature Learning for Person Search

no code implementations • CVPR 2021 • Hanjae Kim, Sunghun Joung, Ig-Jae Kim, Kwanghoon Sohn

Existing person search methods integrate person detection and re-identification (re-ID) module into a unified system.

Human Detection Person Search

Paper
Add Code

Mining Better Samples for Contrastive Learning of Temporal Correspondence

no code implementations • CVPR 2021 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn

We present a novel framework for contrastive learning of pixel-level representation using only unlabeled video.

Contrastive Learning

Paper
Add Code

CATs: Cost Aggregation Transformers for Visual Correspondence

1 code implementation • NeurIPS 2021 • Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations.

Ranked #5 on Semantic correspondence on PF-WILLOW

Semantic correspondence

132

Paper
Code

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

no code implementations • CVPR 2021 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.

Question Answering Video Question Answering

Paper
Add Code

Memory-guided Unsupervised Image-to-image Translation

no code implementations • CVPR 2021 • Somi Jeong, Youngjung Kim, Eungbean Lee, Kwanghoon Sohn

We present a novel unsupervised framework for instance-level image-to-image translation.

Translation Unsupervised Image-To-Image Translation

Paper
Add Code

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn

In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.

Audio-Visual Synchronization Speech Separation

Paper
Add Code

On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

1 code implementation • 2 Jan 2021 • Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia

Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images.

Stereo Matching

Paper
Code

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

1 code implementation • 15 Dec 2020 • Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn

Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.

Clustering Domain Adaptation +2

Paper
Code

Adaptive confidence thresholding for monocular depth estimation

1 code implementation • ICCV 2021 • Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Dongbo Min

To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps.

Monocular Depth Estimation Stereo Matching

Paper
Code

SumGraph: Video Summarization via Recursive Graph Modeling

no code implementations • ECCV 2020 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.

Video Summarization

Paper
Add Code

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

no code implementations • CVPR 2020 • Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn

To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.

Object object-detection +2

Paper
Add Code

Joint Learning of Semantic Alignment and Object Landmark Detection

no code implementations • ICCV 2019 • Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn

Based on the key insight that the two tasks can mutually provide supervisions to each other, our networks accomplish this through a joint loss function that alternatively imposes a consistency constraint between the two tasks, thereby boosting the performance and addressing the lack of training data in a principled manner.

Object

Paper
Add Code

Context-Aware Emotion Recognition Networks

1 code implementation • ICCV 2019 • Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn

We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.

Ranked #1 on Emotion Recognition in Context on CAER-Dynamic

Emotion Classification Emotion Recognition in Context

Paper
Code

A Large RGB-D Dataset for Semi-supervised Monocular Depth Estimation

no code implementations • 23 Apr 2019 • Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs.

Monocular Depth Estimation Semantic Segmentation

Paper
Add Code

Semantic Attribute Matching Networks

no code implementations • CVPR 2019 • Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn

SAM-Net accomplishes this through an iterative process of establishing reliable correspondences by reducing the attribute discrepancy between the images and synthesizing attribute transferred images using the learned correspondences.

Attribute

Paper
Add Code

Recurrent Transformer Networks for Semantic Correspondence

1 code implementation • NeurIPS 2018 • Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn

Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations.

General Classification Semantic correspondence

Paper
Code

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

no code implementations • ECCV 2018 • Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn

To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks.

regression Semantic correspondence

Paper
Add Code

DCTM: Discrete-Continuous Transformation Matching for Semantic Flow

no code implementations • ICCV 2017 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor.

Semantic correspondence

Paper
Add Code

FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence

1 code implementation • CVPR 2017 • Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn

The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner.

Object Semantic correspondence +1

Paper
Code

Deeply Aggregated Alternating Minimization for Image Restoration

no code implementations • CVPR 2017 • Youngjung Kim, Hyungjoo Jung, Dongbo Min, Kwanghoon Sohn

The proposed framework enables the convolutional neural networks (CNNs) to operate as a prior or regularizer in the AM algorithm.

Image Denoising Image Restoration +1

Paper
Add Code

DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation

no code implementations • 27 Apr 2016 • Seungryong Kim, Dongbo Min, Bumsub Ham, Minh N. Do, Kwanghoon Sohn

In this paper, we propose a novel dense descriptor, called dense adaptive self-correlation (DASC), to estimate multi-modal and multi-spectral dense correspondences.

Paper
Add Code

Efficient Splitting-based Method for Global Image Smoothing

no code implementations • 26 Apr 2016 • Youngjung Kim, Dongbo Min, Bumsub Ham, Kwanghoon Sohn

In this paper, we introduce a highly efficient splitting-based method for global EPS that minimizes the objective function of ${l_2}$ data and prior terms (possibly non-smooth and non-convex) in linear time.

image smoothing

Paper
Add Code

Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields

1 code implementation • 21 Mar 2016 • Seungryong Kim, Kihong Park, Kwanghoon Sohn, Stephen Lin

We present a method for jointly predicting a depth map and intrinsic images from single-image input.

Depth Estimation Depth Prediction +1

Paper
Code

Deep Self-Convolutional Activations Descriptor for Dense Cross-Modal Correspondence

no code implementations • 21 Mar 2016 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

We present a novel descriptor, called deep self-convolutional activations (DeSCA), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions.

Paper
Add Code

DASC: Dense Adaptive Self-Correlation Descriptor for Multi-Modal and Multi-Spectral Correspondence

no code implementations • CVPR 2015 • Seungryong Kim, Dongbo Min, Bumsub Ham, Seungchul Ryu, Minh N. Do, Kwanghoon Sohn

To further improve the matching quality and runtime efficiency, we propose a patch-wise receptive field pooling, in which a sampling pattern is optimized with a discriminative learning.

Optical Flow Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.