Search Results for author: Seungwhan Moon

Found 33 papers, 13 papers with code

An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC)

no code implementations SIGDIAL (ACL) 2021 Satwik Kottur, Paul Crook, Seungwhan Moon, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

There is a growing interest in virtual assistants with multimodal capabilities, e. g., inferring the context of a conversation through scene understanding.

Scene Understanding

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

no code implementations27 Sep 2023 Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i. e. text, image, video, audio, IMU motion sensor), and generates textual responses.

Language Modelling Video Question Answering

Normalized Contrastive Learning for Text-Video Retrieval

1 code implementation30 Nov 2022 Yookoon Park, Mahmoud Azab, Bo Xiong, Seungwhan Moon, Florian Metze, Gourab Kundu, Kirmani Ahmed

Cross-modal contrastive learning has led the recent advances in multimodal retrieval with its simplicity and effectiveness.

Contrastive Learning Cross-Modal Retrieval +2

Navigating Connected Memories with a Task-oriented Dialog System

1 code implementation15 Nov 2022 Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections.


Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

no code implementations8 Nov 2022 Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection.

Benchmarking Retrieval

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

1 code implementation26 Oct 2022 Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP).

Activity Recognition Contrastive Learning +1

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

1 code implementation EMNLP 2021 Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, Pascale Fung

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data.

Dialogue State Tracking Question Answering +1

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

2 code implementations10 May 2021 Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data.

Dialogue State Tracking Transfer Learning

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

1 code implementation EMNLP 2021 Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi

Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment.

Language Modelling

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

1 code implementation ACL 2021 Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur

A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations.

Object Tracking Visual Reasoning

Continual Learning in Task-Oriented Dialogue Systems

1 code implementation EMNLP 2021 Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Zhiguang Wang

Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining.

Continual Learning Intent Recognition +3

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions

1 code implementation Findings (EMNLP) 2021 Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu

As there is no clean mapping for a user's free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users' utterances to such distributions.

Dialogue State Tracking

Adding Chit-Chat to Enhance Task-Oriented Dialogues

1 code implementation NAACL 2021 Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e. g., booking hotels), open-domain chatbots aim at making socially engaging conversations.

Dialogue Generation Dialogue Understanding +1

Situated and Interactive Multimodal Conversations

2 code implementations COLING 2020 Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

Next generation virtual assistants are envisioned to handle multimodal inputs (e. g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e. g., displaying a route in addition to generating the system's utterance).

Response Generation

User Memory Reasoning for Conversational Recommendation

no code implementations COLING 2020 Hu Xu, Seungwhan Moon, Honglei Liu, Pararth Shah, Bing Liu, Philip S. Yu

We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph, to allow for natural interactions and accurate recommendations.

Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

1 code implementation EMNLP 2020 Pedro Rodriguez, Paul Crook, Seungwhan Moon, Zhiguang Wang

Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know.

Memory Grounded Conversational Reasoning

no code implementations IJCNLP 2019 Seungwhan Moon, Pararth Shah, Rajen Subba, Anuj Kumar

To implement such a system, we collect a new corpus of memory grounded conversations, which comprises human-to-human role-playing dialogs given synthetic memory graphs with simulated attributes.

Memory Graph Networks for Explainable Memory-grounded Question Answering

no code implementations CONLL 2019 Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba

We introduce Episodic Memory QA, the task of answering personal user questions grounded on memory graph (MG), where episodic memories and related entity nodes are connected via relational edges.

Question Answering

Active Federated Learning

no code implementations27 Sep 2019 Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar

To exploit this we propose Active Federated Learning, where in each round clients are selected not uniformly at random, but with a probability conditioned on the current model and the data on the client to maximize efficiency.

Federated Learning

Federated User Representation Learning

no code implementations ICLR 2020 Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin

Furthermore, we show that user embeddings learned in FL and the centralized setting have a very similar structure, indicating that FURL can learn collaboratively through the shared parameters while preserving user privacy.

Federated Learning Privacy Preserving +1

OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs

no code implementations ACL 2019 Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba

We study a conversational reasoning model that strategically traverses through a large-scale common fact knowledge graph (KG) to introduce engaging and contextually diverse entities and attributes.

Knowledge Graphs

Multimodal Named Entity Disambiguation for Noisy Social Media Posts

no code implementations ACL 2018 Seungwhan Moon, Leonardo Neves, Vitor Carvalho

We introduce the new Multimodal Named Entity Disambiguation (MNED) task for multimodal social media posts such as Snapchat or Instagram captions, which are composed of short captions with accompanying images.

Entity Disambiguation Image Captioning +2

Multimodal Named Entity Recognition for Short Social Media Posts

no code implementations NAACL 2018 Seungwhan Moon, Leonardo Neves, Vitor Carvalho

We introduce a new task called Multimodal Named Entity Recognition (MNER) for noisy user-generated data such as tweets or Snapchat captions, which comprise short text with accompanying images.

named-entity-recognition Named Entity Recognition +1

Joint Photo Stream and Blog Post Summarization and Exploration

no code implementations CVPR 2015 Gunhee Kim, Seungwhan Moon, Leonid Sigal

We alternate between solving the two coupled latent SVM problems, by first fixing the summarization and solving for the alignment from blog images to photo streams and vice versa.

Transfer Learning

Ranking and Retrieval of Image Sequences From Multiple Paragraph Queries

no code implementations CVPR 2015 Gunhee Kim, Seungwhan Moon, Leonid Sigal

While most previous work has dealt with the relations between a natural language sentence and an image or a video, our work extends to the relations between paragraphs and image sequences.


Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

no code implementations9 Dec 2014 Seungwhan Moon, Suyoun Kim, Haohan Wang

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality.

Video Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.