Dense Text-to-Image Generation with Attention Modulation

ICCV 2023 Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu

To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout.

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

ICCV 2023 Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.

Compositional Zero-Shot Learning

Panoramic Image-to-Image Translation

11 Apr 2023 Soohyun Kim, Junho Kim, Taekyung Kim, Hwan Heo, Seungryong Kim, Jiyoung Lee, Jin-Hwa Kim

This task is difficult due to the geometric distortion of panoramic images and the lack of a panoramic image dataset with diverse conditions, like weather or time.

Image-to-Image Translation Translation

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild

10 Apr 2023 Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun

Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs).

3D Multi-Person Pose Estimation

Dual-path Adaptation from Image to Video Transformers

CVPR 2023 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

Action Classification Action Recognition In Videos +2

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

14 Mar 2023 Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Jaehoon Ko, Hyeonsu Kim, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting.

Single-View 3D Reconstruction Text to 3D

Robust Camera Pose Refinement for Multi-Resolution Hash Encoding

3 Feb 2023 Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, Jin-Hwa Kim

Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF.

Neural Rendering Novel View Synthesis

Semi-Parametric Video-Grounded Text Generation

27 Jan 2023 Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo

Efficient video-language modeling should consider the computational cost because of a large, sometimes intractable, number of video frames.

Language Modelling Text Generation +2

Language-free Training for Zero-shot Video Grounding

24 Oct 2022 Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.

Video Grounding

MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation

22 Sep 2022 Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, Seungryong Kim

Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image.

Denoising Translation

Automatic Detection of Noisy Electrocardiogram Signals without Explicit Noise Labels

8 Aug 2022 Radhika Dua, Jiyoung Lee, Joon-Myoung Kwon, Edward Choi

Automatic deep learning-based examination of ECG signals can lead to inaccurate diagnosis, and manual analysis involves rejection of noisy ECG samples by clinicians, which might cost extra time.

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

27 Jul 2022 Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.

Autonomous Driving Meta-Learning

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

25 May 2022 Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee

Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).

Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref) +4

Probabilistic Representations for Video Contrastive Learning

CVPR 2022 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.

Action Recognition Contrastive Learning +3

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution

6 Feb 2022 Somi Jeong, Jiyoung Lee, Kwanghoon Sohn

We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.

Disentanglement Translation +1

Exploration into Translation-Equivariant Image Quantization

1 Dec 2021 Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.

Quantization Text Generation +1

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

12 Nov 2021 Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals.

Representation Learning

Wide and Narrow: Video Prediction from Context and Motion

22 Oct 2021 Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn

Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.

Video Prediction

Self-balanced Learning For Domain Generalization

31 Aug 2021 Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.

Domain Generalization

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

8 Aug 2021 Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR.

Representation Learning Transfer Learning

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

CVPR 2021 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.

Question Answering Video Question Answering

SumGraph: Video Summarization via Recursive Graph Modeling

ECCV 2020 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.

Video Summarization

Context-Aware Emotion Recognition Networks

ICCV 2019 Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn

We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.

Emotion Recognition in Context

