Search Results for author: Hyounghun Kim

Found 12 papers, 7 papers with code

NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue

1 code implementation EMNLP 2021 Hyounghun Kim, Jialu Li, Mohit Bansal

In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.

Data Augmentation Dynamic Time Warping +1

Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations

no code implementations20 Oct 2023 Jihyoung Jang, Minseong Boo, Hyounghun Kim

In this paper, we introduce a new 1M multi-session dialogue dataset, called Conversation Chronicles, for implementing a long-term conversation setup in which time intervals and fine-grained speaker relationships are incorporated.

Dialogue Generation Language Modelling +1

CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

1 code implementation NAACL 2022 Hyounghun Kim, Abhay Zala, Mohit Bansal

Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change.

counterfactual

On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets

no code implementations insights (ACL) 2022 Hyounghun Kim, Aishwarya Padmakumar, Di Jin, Mohit Bansal, Dilek Hakkani-Tur

Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes.

CAISE: Conversational Agent for Image Search and Editing

1 code implementation24 Feb 2022 Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal

To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.

Image Retrieval

Continuous Language Generative Flow

1 code implementation ACL 2021 Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal

Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.

Data Augmentation Density Estimation +9

FixMyPose: Pose Correctional Captioning and Retrieval

1 code implementation4 Apr 2021 Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.

Pose Retrieval Retrieval

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

no code implementations Findings of the Association for Computational Linguistics 2020 Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal

During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment.

Referring Expression Referring Expression Comprehension +1

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

1 code implementation ACL 2020 Hyounghun Kim, Zineng Tang, Mohit Bansal

Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier.

Image Captioning Multi-Label Classification +3

Modality-Balanced Models for Visual Dialogue

no code implementations17 Jan 2020 Hyounghun Kim, Hao Tan, Mohit Bansal

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue.

Visual Dialog

Improving Visual Question Answering by Referring to Generated Paragraph Captions

no code implementations ACL 2019 Hyounghun Kim, Mohit Bansal

These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering.

Image Captioning Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.