Search Results for author: Jongseok Kim

Found 5 papers, 2 papers with code

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

no code implementations • ECCV 2020 • Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim

We address character grounding and re-identification in multiple story-based videos like movies and associated text descriptions.

Gender Prediction

Paper
Add Code

Cycled Compositional Learning between Images and Text

no code implementations • 24 Jul 2021 • Jongseok Kim, Youngjae Yu, Seunghwan Lee, GunheeKim

Since this one-way mapping is highly under-constrained, we couple it with an inverse relation learning with the Correction Network and introduce a cycled relation for given Image We participate in Fashion IQ 2020 challenge and have won the first place with the ensemble of our model.

Relation

Paper
Add Code

Transitional Adaptation of Pretrained Models for Visual Storytelling

no code implementations • CVPR 2021 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jongseok Kim, Gunhee Kim

In this work, we claim that a transitional adaptation task is required between pretraining and finetuning to harmonize the visual encoder and the language model for challenging downstream target tasks like visual storytelling.

Ranked #1 on Visual Storytelling on VIST (ROUGE-L metric, using extra training data)

Image Captioning Language Modelling +3

Paper
Add Code

Viewpoint-Agnostic Change Captioning With Cycle Consistency

1 code implementation • ICCV 2021 • Hoeseong Kim, Jongseok Kim, Hyungseok Lee, Hyunsung Park, Gunhee Kim

In addition, we propose a cycle consistency module that can potentially improve the performance of any change captioning networks in general by matching the composite feature of the generated caption and before image with the after image feature.

Paper
Code

A Joint Sequence Fusion Model for Video Question Answering and Retrieval

2 code implementations • ECCV 2018 • Youngjae Yu, Jongseok Kim, Gunhee Kim

We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e. g. a video clip and a language sentence).

Ranked #34 on Video Retrieval on LSMDC

Multiple-choice Question Answering +7

237

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.