Search Results for author: Junyeong Kim

Found 10 papers, 4 papers with code

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

1 code implementation15 Dec 2023 Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history.

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

no code implementations12 Dec 2022 Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chang D. Yoo

Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question.

Hallucination Sentence

Selective Query-guided Debiasing for Video Corpus Moment Retrieval

1 code implementation17 Oct 2022 Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee Suk Yoon, Chang D. Yoo

Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query.

Moment Retrieval Retrieval +1

Scalable SoftGroup for 3D Instance Segmentation on Point Clouds

1 code implementation17 Sep 2022 Thang Vu, Kookhoi Kim, Tung M. Luu, Thanh Nguyen, Junyeong Kim, Chang D. Yoo

Furthermore, SoftGroup can be extended to perform object detection and panoptic segmentation with nontrivial improvements over existing methods.

3D Instance Segmentation object-detection +3

Structured Co-reference Graph Attention for Video-grounded Dialogue

no code implementations24 Mar 2021 Junyeong Kim, Sunjae Yoon, Dahyun Kim, Chang D. Yoo

A video-grounded dialogue system referred to as the Structured Co-reference Graph Attention (SCGA) is presented for decoding the answer sequence to a question regarding a given video while keeping track of the dialogue context.

Graph Attention

VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval

1 code implementation ECCV 2020 Minuk Ma, Sunjae Yoon, Junyeong Kim, Young-Joon Lee, Sunghun Kang, Chang D. Yoo

This paper explores methods for performing VMR in a weakly-supervised manner (wVMR): training is performed without temporal moment labels but only with the text query that describes a segment of the video.

Contrastive Learning Moment Retrieval +1

Modality Shifting Attention Network for Multi-modal Video Question Answering

no code implementations CVPR 2020 Junyeong Kim, Minuk Ma, Trung Pham, Kyung-Su Kim, Chang D. Yoo

To this end, MSAN is based on (1) the moment proposal network (MPN) that attempts to locate the most appropriate temporal moment from each of the modalities, and also on (2) the heterogeneous reasoning network (HRN) that predicts the answer using an attention mechanism on both modalities.

Question Answering Temporal Localization +1

Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering

no code implementations28 May 2019 Junyeong Kim, Minuk Ma, Kyung-Su Kim, Sungjin Kim, Chang D. Yoo

This paper proposes a method to gain extra supervision via multi-task learning for multi-modal video question answering.

Inductive Bias Metric Learning +5

Progressive Attention Memory Network for Movie Story Question Answering

no code implementations CVPR 2019 Junyeong Kim, Minuk Ma, Kyung-Su Kim, Sungjin Kim, Chang D. Yoo

To overcome these challenges, PAMN involves three main features: (1) progressive attention mechanism that utilizes cues from both question and answer to progressively prune out irrelevant temporal parts in memory, (2) dynamic modality fusion that adaptively determines the contribution of each modality for answering the current question, and (3) belief correction answering scheme that successively corrects the prediction score on each candidate answer.

Question Answering Video Story QA +1

Pivot Correlational Neural Network for Multimodal Video Categorization

no code implementations ECCV 2018 Sunghun Kang, Junyeong Kim, Hyun-Soo Choi, Sungjin Kim, Chang D. Yoo

The architecture is trained to maximizes the correlation between the hidden states as well as the predictions of the modal-agnostic pivot stream and modal-specific stream in the network.

Cannot find the paper you are looking for? You can Submit a new open access paper.