Search Results for author: Eun-Sol Kim

Found 18 papers, 6 papers with code

HOTR: End-to-End Human-Object Interaction Detection with Transformers

1 code implementation • CVPR 2021 • Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i. e., humans) and target (i. e., objects) of interaction, and ii) the classification of the interaction labels.

Ranked #16 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object +2

133

Paper
Code

Boundary-aware Self-supervised Learning for Video Scene Segmentation

1 code implementation • 14 Jan 2022 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.

Scene Segmentation Self-Supervised Learning

107

Paper
Code

Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering

1 code implementation • ACL 2022 • Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang

Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself.

Question Answering Visual Question Answering

Paper
Code

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

1 code implementation • CVPR 2022 • Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim

In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).

Contrastive Learning Dynamic Time Warping +1

Paper
Code

Selective Token Generation for Few-shot Natural Language Generation

1 code implementation • COLING 2022 • DaeJin Jo, Taehwan Kwon, Eun-Sol Kim, Sungwoong Kim

Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability.

Data-to-Text Generation Language Modelling +3

Paper
Code

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

1 code implementation • 29 Dec 2020 • Sangwoong Yoon, Woo Young Kang, Sungwook Jeon, SeongEun Lee, Changjin Han, Jonghun Park, Eun-Sol Kim

Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks.

Graph Similarity Image Retrieval +3

Paper
Code

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

no code implementations • 20 Jan 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.

Paper
Add Code

Compositional Structure Learning for Sequential Video Data

no code implementations • 3 Jul 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.

Paper
Add Code

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

no code implementations • 17 Jan 2020 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.

Graph Learning Video Understanding

Paper
Add Code

Hypergraph Attention Networks for Multimodal Learning

no code implementations • CVPR 2020 • Eun-Sol Kim, Woo Young Kang, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang

HANs follow the process: constructing the common semantic space with symbolic graphs of each modality, matching the semantics between sub-structures of the symbolic graphs, constructing co-attention maps between the graphs in the semantic space, and integrating the multimodal inputs using the co-attention maps to get the final joint representation.

Paper
Add Code

Spectrally Similar Graph Pooling

no code implementations • 1 Jan 2021 • Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang

To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.

Image Retrieval Retrieval

Paper
Add Code

Selective Token Generation for Few-shot Language Modeling

no code implementations • 29 Sep 2021 • DaeJin Jo, Taehwan Kwon, Sungwoong Kim, Eun-Sol Kim

Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) for few-shot natural language generation (NLG) tasks.

Data-to-Text Generation Language Modelling +3

Paper
Add Code

Boundary-aware Pre-training for Video Scene Segmentation

no code implementations • 29 Sep 2021 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Scene Segmentation Self-Supervised Learning

Paper
Add Code

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.

Model Optimization Representation Learning +2

Paper
Add Code

Semantic Alignment with Calibrated Similarity for Multilingual Sentence Embedding

no code implementations • Findings (EMNLP) 2021 • Jiyeon Ham, Eun-Sol Kim

Predicting the similarity score consists of two sub-tasks, which are monolingual similarity evaluation and multilingual sentence retrieval.

Retrieval Semantic Similarity +5

Paper
Add Code

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.

Human-Object Interaction Detection

Paper
Add Code

Dense but Efficient VideoQA for Intricate Compositional Reasoning

no code implementations • 19 Oct 2022 • Jihyeon Lee, Wooyoung Kang, Eun-Sol Kim

It is well known that most of the conventional video question answering (VideoQA) datasets consist of easy questions requiring simple reasoning processes.

Question Answering Video Question Answering

Paper
Add Code

Clustering-based Image-Text Graph Matching for Domain Generalization

no code implementations • 4 Oct 2023 • Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim

However, they use pivot embedding in global manner (i. e., aligning an image embedding with sentence-level text embedding), not fully utilizing the semantic cues of given text description.

Clustering Domain Generalization +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.