Search Results for author: Heeseung Yun

Found 9 papers, 7 papers with code

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

no code implementations ECCV 2020 Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim

We address character grounding and re-identification in multiple story-based videos like movies and associated text descriptions.

Gender Prediction

Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning

1 code implementation CVPR 2023 Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

Language models are capable of commonsense reasoning: while domain-specific models can learn from explicit knowledge (e. g. commonsense graphs [6], ethical norms [25]), and larger models like GPT-3 manifest broad commonsense reasoning capacity.

Language Modelling reinforcement-learning +2

Panoramic Vision Transformer for Saliency Detection in 360° Videos

1 code implementation19 Sep 2022 Heeseung Yun, Sehun Lee, Gunhee Kim

360$^\circ$ video saliency detection is one of the challenging benchmarks for 360$^\circ$ video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360$^\circ$ videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature.

Saliency Prediction Video Quality Assessment +2

Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos

1 code implementation11 Oct 2021 Heeseung Yun, Youngjae Yu, Wonsuk Yang, Kangil Lee, Gunhee Kim

However, previous benchmark tasks for panoramic videos are still limited to evaluate the semantic understanding of audio-visual relationships or spherical spatial property in surroundings.

Audio-visual Question Answering Question Answering +2

Transitional Adaptation of Pretrained Models for Visual Storytelling

no code implementations CVPR 2021 Youngjae Yu, Jiwan Chung, Heeseung Yun, Jongseok Kim, Gunhee Kim

In this work, we claim that a transitional adaptation task is required between pretraining and finetuning to harmonize the visual encoder and the language model for challenging downstream target tasks like visual storytelling.

 Ranked #1 on Visual Storytelling on VIST (ROUGE-L metric, using extra training data)

Image Captioning Language Modelling +3

Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos

1 code implementation ICCV 2021 Heeseung Yun, Youngjae Yu, Wonsuk Yang, Kangil Lee, Gunhee Kim

However, previous benchmark tasks for panoramic videos are still limited to evaluate the semantic understanding of audio-visual relationships or spherical spatial property in surroundings.

Audio-visual Question Answering Question Answering +2

A Mobile Robot Generating Video Summaries of Seniors' Indoor Activities

1 code implementation30 Jan 2019 Chih-Yuan Yang, Heeseung Yun, Srenavis Varadaraj, Jane Yung-jen Hsu

We develop a system which generates summaries from seniors' indoor-activity videos captured by a social robot to help remote family members know their seniors' daily activities at home.

Action Recognition Human Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.