no code implementations • ECCV 2020 • Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim
We address character grounding and re-identification in multiple story-based videos like movies and associated text descriptions.
1 code implementation • ICCV 2023 • Heeseung Yun, Joonil Na, Gunhee Kim
Sound can convey significant information for spatial reasoning in our daily lives.
1 code implementation • CVPR 2023 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi
Language models are capable of commonsense reasoning: while domain-specific models can learn from explicit knowledge (e. g. commonsense graphs [6], ethical norms [25]), and larger models like GPT-3 manifest broad commonsense reasoning capacity.
1 code implementation • 19 Sep 2022 • Heeseung Yun, Sehun Lee, Gunhee Kim
360$^\circ$ video saliency detection is one of the challenging benchmarks for 360$^\circ$ video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360$^\circ$ videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature.
1 code implementation • 25 May 2022 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, JaeSung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Yejin Choi
Large language models readily adapt to novel settings, even without task-specific training data.
1 code implementation • 11 Oct 2021 • Heeseung Yun, Youngjae Yu, Wonsuk Yang, Kangil Lee, Gunhee Kim
However, previous benchmark tasks for panoramic videos are still limited to evaluate the semantic understanding of audio-visual relationships or spherical spatial property in surroundings.
no code implementations • CVPR 2021 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jongseok Kim, Gunhee Kim
In this work, we claim that a transitional adaptation task is required between pretraining and finetuning to harmonize the visual encoder and the language model for challenging downstream target tasks like visual storytelling.
Ranked #1 on Visual Storytelling on VIST (ROUGE-L metric, using extra training data)
1 code implementation • ICCV 2021 • Heeseung Yun, Youngjae Yu, Wonsuk Yang, Kangil Lee, Gunhee Kim
However, previous benchmark tasks for panoramic videos are still limited to evaluate the semantic understanding of audio-visual relationships or spherical spatial property in surroundings.
1 code implementation • 30 Jan 2019 • Chih-Yuan Yang, Heeseung Yun, Srenavis Varadaraj, Jane Yung-jen Hsu
We develop a system which generates summaries from seniors' indoor-activity videos captured by a social robot to help remote family members know their seniors' daily activities at home.