no code implementations • 20 Jan 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.
no code implementations • 3 Jul 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.
no code implementations • 17 Jan 2020 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.
no code implementations • CVPR 2020 • Eun-Sol Kim, Woo Young Kang, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang
HANs follow the process: constructing the common semantic space with symbolic graphs of each modality, matching the semantics between sub-structures of the symbolic graphs, constructing co-attention maps between the graphs in the semantic space, and integrating the multimodal inputs using the co-attention maps to get the final joint representation.
1 code implementation • 29 Dec 2020 • Sangwoong Yoon, Woo Young Kang, Sungwook Jeon, SeongEun Lee, Changjin Han, Jonghun Park, Eun-Sol Kim
Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks.
no code implementations • 1 Jan 2021 • Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang
To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.
1 code implementation • CVPR 2021 • Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i. e., humans) and target (i. e., objects) of interaction, and ii) the classification of the interaction labels.
Ranked #16 on Human-Object Interaction Detection on V-COCO
no code implementations • 29 Sep 2021 • DaeJin Jo, Taehwan Kwon, Sungwoong Kim, Eun-Sol Kim
Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) for few-shot natural language generation (NLG) tasks.
no code implementations • 29 Sep 2021 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim
The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.
1 code implementation • 14 Jan 2022 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim
Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.
1 code implementation • CVPR 2022 • Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).
1 code implementation • ACL 2022 • Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself.
1 code implementation • COLING 2022 • DaeJin Jo, Taehwan Kwon, Eun-Sol Kim, Sungwoong Kim
Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability.
no code implementations • 19 Oct 2022 • Jihyeon Lee, Wooyoung Kang, Eun-Sol Kim
It is well known that most of the conventional video question answering (VideoQA) datasets consist of easy questions requiring simple reasoning processes.
no code implementations • 4 Oct 2023 • Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim
However, they use pivot embedding in global manner (i. e., aligning an image embedding with sentence-level text embedding), not fully utilizing the semantic cues of given text description.
no code implementations • Findings (EMNLP) 2021 • Jiyeon Ham, Eun-Sol Kim
Predicting the similarity score consists of two sub-tasks, which are monolingual similarity evaluation and multilingual sentence retrieval.