Search Results for author: Eun-Sol Kim

Found 18 papers, 6 papers with code

HOTR: End-to-End Human-Object Interaction Detection with Transformers

1 code implementation CVPR 2021 Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i. e., humans) and target (i. e., objects) of interaction, and ii) the classification of the interaction labels.

Human-Object Interaction Detection Object +2

Boundary-aware Self-supervised Learning for Video Scene Segmentation

1 code implementation14 Jan 2022 Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.

Scene Segmentation Self-Supervised Learning

Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering

1 code implementation ACL 2022 Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang

Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself.

Question Answering Visual Question Answering

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

1 code implementation CVPR 2022 Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim

In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).

Contrastive Learning Dynamic Time Warping +1

Selective Token Generation for Few-shot Natural Language Generation

1 code implementation COLING 2022 DaeJin Jo, Taehwan Kwon, Eun-Sol Kim, Sungwoong Kim

Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability.

Data-to-Text Generation Language Modelling +3

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

1 code implementation29 Dec 2020 Sangwoong Yoon, Woo Young Kang, Sungwook Jeon, SeongEun Lee, Changjin Han, Jonghun Park, Eun-Sol Kim

Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks.

Graph Similarity Image Retrieval +3

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

no code implementations20 Jan 2019 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.

Compositional Structure Learning for Sequential Video Data

no code implementations3 Jul 2019 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

no code implementations17 Jan 2020 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.

Graph Learning Video Understanding

Hypergraph Attention Networks for Multimodal Learning

no code implementations CVPR 2020 Eun-Sol Kim, Woo Young Kang, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang

HANs follow the process: constructing the common semantic space with symbolic graphs of each modality, matching the semantics between sub-structures of the symbolic graphs, constructing co-attention maps between the graphs in the semantic space, and integrating the multimodal inputs using the co-attention maps to get the final joint representation.

Spectrally Similar Graph Pooling

no code implementations1 Jan 2021 Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang

To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.

Image Retrieval Retrieval

Selective Token Generation for Few-shot Language Modeling

no code implementations29 Sep 2021 DaeJin Jo, Taehwan Kwon, Sungwoong Kim, Eun-Sol Kim

Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) for few-shot natural language generation (NLG) tasks.

Data-to-Text Generation Language Modelling +3

Boundary-aware Pre-training for Video Scene Segmentation

no code implementations29 Sep 2021 Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.

Scene Segmentation Self-Supervised Learning

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

no code implementations13 Oct 2021 Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.

Model Optimization Representation Learning +2

Semantic Alignment with Calibrated Similarity for Multilingual Sentence Embedding

no code implementations Findings (EMNLP) 2021 Jiyeon Ham, Eun-Sol Kim

Predicting the similarity score consists of two sub-tasks, which are monolingual similarity evaluation and multilingual sentence retrieval.

Retrieval Semantic Similarity +5

Dense but Efficient VideoQA for Intricate Compositional Reasoning

no code implementations19 Oct 2022 Jihyeon Lee, Wooyoung Kang, Eun-Sol Kim

It is well known that most of the conventional video question answering (VideoQA) datasets consist of easy questions requiring simple reasoning processes.

Question Answering Video Question Answering

Clustering-based Image-Text Graph Matching for Domain Generalization

no code implementations4 Oct 2023 Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim

However, they use pivot embedding in global manner (i. e., aligning an image embedding with sentence-level text embedding), not fully utilizing the semantic cues of given text description.

Clustering Domain Generalization +2

Cannot find the paper you are looking for? You can Submit a new open access paper.