1 code implementation • 25 Mar 2025 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
View-invariant representation learning from egocentric (first-person, ego) and exocentric (third-person, exo) videos is a promising approach toward generalizing video understanding systems across multiple viewpoints.
no code implementations • 13 Oct 2024 • Soyoung Yang, Hojun Cho, Jiyoung Lee, Sohee Yoon, Edward Choi, Jaegul Choo, Won Ik Cho
Aspect-based sentiment analysis (ABSA) is the challenging task of extracting sentiment along with its corresponding aspects and opinions from human language.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
1 code implementation • 8 Jul 2024 • Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee
Despite the impressive progress of multimodal generative models, video-to-audio generation still suffers from limited performance and limits the flexibility to prioritize sound synthesis for specific objects within the scene.
Ranked #7 on
Video-to-Sound Generation
on VGG-Sound
1 code implementation • 15 Apr 2024 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.
no code implementations • 21 Feb 2024 • Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi
For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials.
1 code implementation • ICCV 2023 • Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu
To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout.
1 code implementation • ICCV 2023 • Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn
Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.
1 code implementation • 3 Aug 2023 • Jiyoung Lee, Seungho Kim, Seunghyun Won, Joonseok Lee, Marzyeh Ghassemi, James Thorne, Jaeseok Choi, O-Kil Kwon, Edward Choi
In this paper, we focus on the models' visual perception alignment with humans, further referred to as AI-human visual alignment.
no code implementations • 11 Apr 2023 • Soohyun Kim, Junho Kim, Taekyung Kim, Hwan Heo, Seungryong Kim, Jiyoung Lee, Jin-Hwa Kim
This task is difficult due to the geometric distortion of panoramic images and the lack of a panoramic image dataset with diverse conditions, like weather or time.
1 code implementation • 10 Apr 2023 • Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun
Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs).
Ranked #6 on
3D Multi-Person Pose Estimation
on MuPoTS-3D
1 code implementation • CVPR 2023 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.
Ranked #1 on
Action Classification
on Diving-48
1 code implementation • 14 Mar 2023 • Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim
Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting.
1 code implementation • 27 Feb 2023 • Jiyoung Lee, Joon Son Chung, Soo-Whan Chung
This is the first time that face images are used as a condition to train a TTS model.
no code implementations • 3 Feb 2023 • Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, Jin-Hwa Kim
Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF.
no code implementations • 27 Jan 2023 • Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo
Efficient video-language modeling should consider the computational cost because of a large, sometimes intractable, number of video frames.
Ranked #32 on
Video Question Answering
on NExT-QA
no code implementations • 24 Oct 2022 • Jiyoung Lee, Hantae Kim, Hyunchang Cho, Edward Choi, Cheonbok Park
Multi-domain Neural Machine Translation (NMT) trains a single model with multiple domains.
no code implementations • 24 Oct 2022 • Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn
Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.
1 code implementation • 22 Sep 2022 • Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, Seungryong Kim
Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image.
no code implementations • 8 Aug 2022 • Radhika Dua, Jiyoung Lee, Joon-Myoung Kwon, Edward Choi
Automatic deep learning-based examination of ECG signals can lead to inaccurate diagnosis, and manual analysis involves rejection of noisy ECG samples by clinicians, which might cost extra time.
1 code implementation • 27 Jul 2022 • Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn
To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.
1 code implementation • 25 May 2022 • Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee
Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).
Ranked #1 on
Human Judgment Classification
on Pascal-50S
Hallucination Pair-wise Detection (1-ref)
Hallucination Pair-wise Detection (4-ref)
+5
no code implementations • CVPR 2022 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.
1 code implementation • CVPR 2022 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
The rise of deep neural networks has led to several breakthroughs for semantic segmentation.
no code implementations • 6 Feb 2022 • Somi Jeong, Jiyoung Lee, Kwanghoon Sohn
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
2 code implementations • 1 Dec 2021 • Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi
This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.
1 code implementation • 12 Nov 2021 • Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi
EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals.
no code implementations • 24 Oct 2021 • Jiyoung Lee, Wonjae Kim, Daehoon Gwak, Edward Choi
Periodic signals play an important role in daily lives.
no code implementations • 22 Oct 2021 • Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, Kwanghoon Sohn
Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics.
no code implementations • 31 Aug 2021 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
1 code implementation • 8 Aug 2021 • Kyunghoon Hur, Jiyoung Lee, JungWoo Oh, Wesley Price, Young-Hak Kim, Edward Choi
To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR.
no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor
The ability to perform causal and counterfactual reasoning are central properties of human intelligence.
no code implementations • CVPR 2021 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.
no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn
In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.
no code implementations • ECCV 2020 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.
1 code implementation • ICCV 2019 • Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn
We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.
Ranked #1 on
Emotion Recognition in Context
on CAER-Dynamic