Search Results for author: Jiyoung Lee

Found 32 papers, 15 papers with code

Bridging Vision and Language Spaces with Assignment Prediction

1 code implementation • 15 Apr 2024 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.

Cross-Modal Retrieval Image Captioning +3

Paper
Code

KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge

no code implementations • 21 Feb 2024 • Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi

For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials.

4k Multiple-choice

Paper
Add Code

Dense Text-to-Image Generation with Attention Modulation

1 code implementation • ICCV 2023 • Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu

To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout.

Text-to-Image Generation

413

Paper
Code

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

1 code implementation • ICCV 2023 • Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.

Attribute Compositional Zero-Shot Learning +1

Paper
Code

VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception

1 code implementation • 3 Aug 2023 • Jiyoung Lee, Seungho Kim, Seunghyun Won, Joonseok Lee, Marzyeh Ghassemi, James Thorne, Jaeseok Choi, O-Kil Kwon, Edward Choi

In this paper, we focus on the models' visual perception alignment with humans, further referred to as AI-human visual alignment.

Image Classification

Paper
Code

Panoramic Image-to-Image Translation

no code implementations • 11 Apr 2023 • Soohyun Kim, Junho Kim, Taekyung Kim, Hwan Heo, Seungryong Kim, Jiyoung Lee, Jin-Hwa Kim

This task is difficult due to the geometric distortion of panoramic images and the lack of a panoramic image dataset with diverse conditions, like weather or time.

Image-to-Image Translation Translation

Paper
Add Code

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild

1 code implementation • 10 Apr 2023 • Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun

Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs).

Ranked #6 on 3D Multi-Person Pose Estimation on MuPoTS-3D

3D Multi-Person Pose Estimation

149

Paper
Code

Dual-path Adaptation from Image to Video Transformers

1 code implementation • CVPR 2023 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

Ranked #1 on Action Classification on Diving-48

Action Classification Action Recognition In Videos +2

Paper
Code

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

1 code implementation • 14 Mar 2023 • Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting.

3D Generation Single-View 3D Reconstruction +1

696

Paper
Code

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

1 code implementation • 27 Feb 2023 • Jiyoung Lee, Joon Son Chung, Soo-Whan Chung

This is the first time that face images are used as a condition to train a TTS model.

Speech Synthesis Text-To-Speech Synthesis

Paper
Code

Robust Camera Pose Refinement for Multi-Resolution Hash Encoding

no code implementations • 3 Feb 2023 • Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, Jin-Hwa Kim

Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF.

Neural Rendering Novel View Synthesis

Paper
Add Code

Semi-Parametric Video-Grounded Text Generation

no code implementations • 27 Jan 2023 • Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo

Efficient video-language modeling should consider the computational cost because of a large, sometimes intractable, number of video frames.

Ranked #12 on Video Question Answering on NExT-QA

Language Modelling Text Generation +2

Paper
Add Code

Specializing Multi-domain NMT via Penalizing Low Mutual Information

no code implementations • 24 Oct 2022 • Jiyoung Lee, Hantae Kim, Hyunchang Cho, Edward Choi, Cheonbok Park

Multi-domain Neural Machine Translation (NMT) trains a single model with multiple domains.

Machine Translation NMT +1

Paper
Add Code

Language-free Training for Zero-shot Video Grounding

no code implementations • 24 Oct 2022 • Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.

Video Grounding

Paper
Add Code

MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation

1 code implementation • 22 Sep 2022 • Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, Seungryong Kim

Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image.

Denoising Translation

Paper
Code

Automatic Detection of Noisy Electrocardiogram Signals without Explicit Noise Labels

no code implementations • 8 Aug 2022 • Radhika Dua, Jiyoung Lee, Joon-Myoung Kwon, Edward Choi

Automatic deep learning-based examination of ECG signals can lead to inaccurate diagnosis, and manual analysis involves rejection of noisy ECG samples by clinicians, which might cost extra time.

Paper
Add Code

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

no code implementations • 27 Jul 2022 • Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.

Autonomous Driving Meta-Learning

Paper
Add Code

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

1 code implementation • 25 May 2022 • Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee

Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).

Ranked #1 on Human Judgment Classification on Pascal-50S

Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref) +5

Paper
Code

Probabilistic Representations for Video Contrastive Learning

no code implementations • CVPR 2022 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.

Action Recognition Contrastive Learning +3

Paper
Add Code

Pin the Memory: Learning to Generalize Semantic Segmentation

1 code implementation • CVPR 2022 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

The rise of deep neural networks has led to several breakthroughs for semantic segmentation.

Domain Generalization Meta-Learning +2

Paper
Code

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution

no code implementations • 6 Feb 2022 • Somi Jeong, Jiyoung Lee, Kwanghoon Sohn

We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.

Disentanglement Translation +1

Paper
Add Code

Exploration into Translation-Equivariant Image Quantization

2 code implementations • 1 Dec 2021 • Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.

Quantization Text Generation +2

1,861

Paper
Code

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

1 code implementation • 12 Nov 2021 • Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals.

Representation Learning

Paper
Code

Conditional Generation of Periodic Signals with Fourier-Based Decoder

no code implementations • 24 Oct 2021 • Jiyoung Lee, Wonjae Kim, Daehoon Gwak, Edward Choi

Periodic signals play an important role in daily lives.

Imputation

Paper
Add Code

Wide and Narrow: Video Prediction from Context and Motion

no code implementations • 22 Oct 2021 • Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn

Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.

Video Prediction

Paper
Add Code

Self-balanced Learning For Domain Generalization

no code implementations • 31 Aug 2021 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.

Domain Generalization

Paper
Add Code

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

1 code implementation • 8 Aug 2021 • Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR.

Representation Learning Transfer Learning

Paper
Code

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor

The ability to perform causal and counterfactual reasoning are central properties of human intelligence.

Causal Discovery counterfactual +2

Paper
Add Code

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

no code implementations • CVPR 2021 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn

As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.

Question Answering Video Question Answering

Paper
Add Code

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn

In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.

Audio-Visual Synchronization Speech Separation

Paper
Add Code

SumGraph: Video Summarization via Recursive Graph Modeling

no code implementations • ECCV 2020 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.

Video Summarization

Paper
Add Code

Context-Aware Emotion Recognition Networks

1 code implementation • ICCV 2019 • Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn

We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.

Ranked #1 on Emotion Recognition in Context on CAER-Dynamic

Emotion Classification Emotion Recognition in Context

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.