Search Results for author: Jiyoung Lee

Found 32 papers, 15 papers with code

Bridging Vision and Language Spaces with Assignment Prediction

1 code implementation15 Apr 2024 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.

Cross-Modal Retrieval Image Captioning +3

KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge

no code implementations21 Feb 2024 Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi

For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials.

4k Multiple-choice

Dense Text-to-Image Generation with Attention Modulation

1 code implementation ICCV 2023 Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu

To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout.

Text-to-Image Generation

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

1 code implementation ICCV 2023 Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.

Attribute Compositional Zero-Shot Learning +1

Panoramic Image-to-Image Translation

no code implementations11 Apr 2023 Soohyun Kim, Junho Kim, Taekyung Kim, Hwan Heo, Seungryong Kim, Jiyoung Lee, Jin-Hwa Kim

This task is difficult due to the geometric distortion of panoramic images and the lack of a panoramic image dataset with diverse conditions, like weather or time.

Image-to-Image Translation Translation

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild

1 code implementation10 Apr 2023 Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun

Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs).

3D Multi-Person Pose Estimation

Dual-path Adaptation from Image to Video Transformers

1 code implementation CVPR 2023 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

Action Classification Action Recognition In Videos +2

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

1 code implementation14 Mar 2023 Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting.

3D Generation Single-View 3D Reconstruction +1

Robust Camera Pose Refinement for Multi-Resolution Hash Encoding

no code implementations3 Feb 2023 Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, Jin-Hwa Kim

Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF.

Neural Rendering Novel View Synthesis

Semi-Parametric Video-Grounded Text Generation

no code implementations27 Jan 2023 Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo

Efficient video-language modeling should consider the computational cost because of a large, sometimes intractable, number of video frames.

Language Modelling Text Generation +2

Language-free Training for Zero-shot Video Grounding

no code implementations24 Oct 2022 Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.

Video Grounding

MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation

1 code implementation22 Sep 2022 Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, Seungryong Kim

Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image.

Denoising Translation

Automatic Detection of Noisy Electrocardiogram Signals without Explicit Noise Labels

no code implementations8 Aug 2022 Radhika Dua, Jiyoung Lee, Joon-Myoung Kwon, Edward Choi

Automatic deep learning-based examination of ECG signals can lead to inaccurate diagnosis, and manual analysis involves rejection of noisy ECG samples by clinicians, which might cost extra time.

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

no code implementations27 Jul 2022 Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.

Autonomous Driving Meta-Learning

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

1 code implementation25 May 2022 Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee

Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).

Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref) +5

Probabilistic Representations for Video Contrastive Learning

no code implementations CVPR 2022 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.

Action Recognition Contrastive Learning +3

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution

no code implementations6 Feb 2022 Somi Jeong, Jiyoung Lee, Kwanghoon Sohn

We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.

Disentanglement Translation +1

Exploration into Translation-Equivariant Image Quantization

2 code implementations1 Dec 2021 Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.

Quantization Text Generation +2

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

1 code implementation12 Nov 2021 Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals.

Representation Learning

Wide and Narrow: Video Prediction from Context and Motion

no code implementations22 Oct 2021 Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn

Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.

Video Prediction

Self-balanced Learning For Domain Generalization

no code implementations31 Aug 2021 Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.

Domain Generalization

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

1 code implementation8 Aug 2021 Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR.

Representation Learning Transfer Learning

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

no code implementations CVPR 2021 Jungin Park, Jiyoung Lee, Kwanghoon Sohn

As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.

Question Answering Video Question Answering

SumGraph: Video Summarization via Recursive Graph Modeling

no code implementations ECCV 2020 Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.

Video Summarization

Context-Aware Emotion Recognition Networks

1 code implementation ICCV 2019 Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn

We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.

Emotion Classification Emotion Recognition in Context

Cannot find the paper you are looking for? You can Submit a new open access paper.