1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.
1 code implementation • EMNLP 2021 • Hyounghun Kim, Jialu Li, Mohit Bansal
In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.
1 code implementation • 3 Oct 2024 • Minwook Bae, Hyounghun Kim
Generating a long story of several thousand words with narrative coherence using Large Language Models (LLMs) has been a challenging task.
no code implementations • 3 Oct 2024 • Jihyoung Jang, TaeYoung Kim, Hyounghun Kim
Recently introduced dialogue systems have demonstrated high usability.
1 code implementation • 3 Oct 2024 • Seokhyun An, Hyounghun Kim
Our findings illuminate the role of establishing an adequate output space in alignment, highlighting the potential of the extensive inherent capabilities of pre-trained LLMs.
no code implementations • 20 Oct 2023 • Jihyoung Jang, Minseong Boo, Hyounghun Kim
In this paper, we introduce a new 1M multi-session dialogue dataset, called Conversation Chronicles, for implementing a long-term conversation setup in which time intervals and fine-grained speaker relationships are incorporated.
1 code implementation • NAACL 2022 • Hyounghun Kim, Abhay Zala, Mohit Bansal
Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change.
no code implementations • insights (ACL) 2022 • Hyounghun Kim, Aishwarya Padmakumar, Di Jin, Mohit Bansal, Dilek Hakkani-Tur
Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes.
1 code implementation • 24 Feb 2022 • Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal
To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.
1 code implementation • ACL 2021 • Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal
Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.
1 code implementation • 4 Apr 2021 • Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal
During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal
During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment.
1 code implementation • ACL 2020 • Hyounghun Kim, Zineng Tang, Mohit Bansal
Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier.
no code implementations • 17 Jan 2020 • Hyounghun Kim, Hao Tan, Mohit Bansal
The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue.
no code implementations • ACL 2019 • Hyounghun Kim, Mohit Bansal
These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering.