Search Results for author: Seungwhan Moon

Found 35 papers, 13 papers with code

An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC)

no code implementations • SIGDIAL (ACL) 2021 • Satwik Kottur, Paul Crook, Seungwhan Moon, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

There is a growing interest in virtual assistants with multimodal capabilities, e. g., inferring the context of a conversation through scene understanding.

Scene Understanding

Paper
Add Code

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

no code implementations • 7 Mar 2024 • JieLin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei LI, Babak Damavandi, Seungwhan Moon

We have developed the \textbf{SnapNTell Dataset}, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses.

Question Answering Retrieval +1

Paper
Add Code

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

no code implementations • 16 Feb 2024 • Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook

We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities.

Avg Dialogue State Tracking +1

Paper
Add Code

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

no code implementations • 27 Sep 2023 • Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i. e. text, image, video, audio, IMU motion sensor), and generates textual responses.

Ranked #7 on Video Question Answering on STAR Benchmark

Language Modelling Video Question Answering

Paper
Add Code

Normalized Contrastive Learning for Text-Video Retrieval

1 code implementation • 30 Nov 2022 • Yookoon Park, Mahmoud Azab, Bo Xiong, Seungwhan Moon, Florian Metze, Gourab Kundu, Kirmani Ahmed

Cross-modal contrastive learning has led the recent advances in multimodal retrieval with its simplicity and effectiveness.

Contrastive Learning Cross-Modal Retrieval +2

Paper
Code

Navigating Connected Memories with a Task-oriented Dialog System

1 code implementation • 15 Nov 2022 • Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections.

Retrieval

Paper
Code

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

no code implementations • 8 Nov 2022 • Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection.

Benchmarking Retrieval

Paper
Add Code

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

1 code implementation • 26 Oct 2022 • Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP).

Activity Recognition Contrastive Learning +1

Paper
Code

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

no code implementations • 10 Oct 2022 • Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon

Searching troves of videos with textual descriptions is a core multimodal retrieval task.

Retrieval Text to Video Retrieval +2

Paper
Add Code

KETOD: Knowledge-Enriched Task-Oriented Dialogue

no code implementations • Findings (NAACL) 2022 • Zhiyu Chen, Bing Liu, Seungwhan Moon, Chinnadhurai Sankar, Paul Crook, William Yang Wang

We also propose two new models, SimpleToDPlus and Combiner, for the proposed task.

Response Generation

Paper
Add Code

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

1 code implementation • EMNLP 2021 • Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, Pascale Fung

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data.

Dialogue State Tracking Question Answering +1

Paper
Code

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue StateTracking

1 code implementation • NAACL 2021 • Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle unseen domains without the expense of collecting in-domain data.

Dialogue State Tracking Transfer Learning

Paper
Code

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

2 code implementations • 10 May 2021 • Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data.

Dialogue State Tracking Transfer Learning

Paper
Code

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

1 code implementation • EMNLP 2021 • Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi

Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment.

Language Modelling

101

Paper
Code

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

1 code implementation • ACL 2021 • Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur

A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations.

Object Tracking Visual Reasoning

Paper
Code

Continual Learning in Task-Oriented Dialogue Systems

1 code implementation • EMNLP 2021 • Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Zhiguang Wang

Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining.

Continual Learning Intent Recognition +3

Paper
Code

Overview of the Ninth Dialog System Technology Challenge: DSTC9

no code implementations • 12 Nov 2020 • Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David Traum, Maxine Eskenazi, Ahmad Beirami, Eunjoon, Cho, Paul A. Crook, Ankita De, Alborz Geramifard, Satwik Kottur, Seungwhan Moon, Shivani Poddar, Rajen Subba

Interactive evaluation of dialog, and 4.

Interactive Evaluation of Dialog

Paper
Add Code

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions

1 code implementation • Findings (EMNLP) 2021 • Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu

As there is no clean mapping for a user's free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users' utterances to such distributions.

Dialogue State Tracking

Paper
Code

Adding Chit-Chat to Enhance Task-Oriented Dialogues

1 code implementation • NAACL 2021 • Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e. g., booking hotels), open-domain chatbots aim at making socially engaging conversations.

Dialogue Generation Dialogue Understanding +1

Paper
Code

Situated and Interactive Multimodal Conversations

2 code implementations • COLING 2020 • Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

Next generation virtual assistants are envisioned to handle multimodal inputs (e. g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e. g., displaying a route in addition to generating the system's utterance).

Response Generation

130

Paper
Code

User Memory Reasoning for Conversational Recommendation

no code implementations • COLING 2020 • Hu Xu, Seungwhan Moon, Honglei Liu, Pararth Shah, Bing Liu, Philip S. Yu

We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph, to allow for natural interactions and accurate recommendations.

Paper
Add Code

Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

1 code implementation • EMNLP 2020 • Pedro Rodriguez, Paul Crook, Seungwhan Moon, Zhiguang Wang

Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know.

Paper
Code

Memory Graph Networks for Explainable Memory-grounded Question Answering

no code implementations • CONLL 2019 • Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba

We introduce Episodic Memory QA, the task of answering personal user questions grounded on memory graph (MG), where episodic memories and related entity nodes are connected via relational edges.

Question Answering

Paper
Add Code

Memory Grounded Conversational Reasoning

no code implementations • IJCNLP 2019 • Seungwhan Moon, Pararth Shah, Rajen Subba, Anuj Kumar

To implement such a system, we collect a new corpus of memory grounded conversations, which comprises human-to-human role-playing dialogs given synthetic memory graphs with simulated attributes.

Paper
Add Code

Federated User Representation Learning

no code implementations • ICLR 2020 • Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin

Furthermore, we show that user embeddings learned in FL and the centralized setting have a very similar structure, indicating that FURL can learn collaboratively through the shared parameters while preserving user privacy.

Federated Learning Privacy Preserving +1

Paper
Add Code

Active Federated Learning

no code implementations • 27 Sep 2019 • Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar

To exploit this we propose Active Federated Learning, where in each round clients are selected not uniformly at random, but with a probability conditioned on the current model and the data on the client to maximize efficiency.

Federated Learning

Paper
Add Code

OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs

no code implementations • ACL 2019 • Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba

We study a conversational reasoning model that strategically traverses through a large-scale common fact knowledge graph (KG) to introduce engaging and contextually diverse entities and attributes.

Knowledge Graphs

Paper
Add Code

Multimodal Named Entity Disambiguation for Noisy Social Media Posts

no code implementations • ACL 2018 • Seungwhan Moon, Leonardo Neves, Vitor Carvalho

We introduce the new Multimodal Named Entity Disambiguation (MNED) task for multimodal social media posts such as Snapchat or Instagram captions, which are composed of short captions with accompanying images.

Entity Disambiguation Image Captioning +2

Paper
Add Code

Multimodal Named Entity Recognition for Short Social Media Posts

no code implementations • NAACL 2018 • Seungwhan Moon, Leonardo Neves, Vitor Carvalho

We introduce a new task called Multimodal Named Entity Recognition (MNER) for noisy user-generated data such as tweets or Snapchat captions, which comprise short text with accompanying images.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Metaphor Detection with Topic Transition, Emotion and Cognition in Context

no code implementations • ACL 2016 • Hyeju Jang, Yohan Jo, Qinlan Shen, Michael Miller, Seungwhan Moon, Carolyn Ros{\'e}

Paper
Add Code

Metaphor Detection in Discourse

no code implementations • WS 2015 • Hyeju Jang, Seungwhan Moon, Yohan Jo, Carolyn Ros{\'e}

Paper
Add Code

Joint Photo Stream and Blog Post Summarization and Exploration

no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal

We alternate between solving the two coupled latent SVM problems, by first fixing the summarization and solving for the alignment from blog images to photo streams and vice versa.

Transfer Learning

Paper
Add Code

Ranking and Retrieval of Image Sequences From Multiple Paragraph Queries

no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal

While most previous work has dealt with the relations between a natural language sentence and an image or a video, our work extends to the relations between paragraphs and image sequences.

Retrieval Sentence

Paper
Add Code

Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

no code implementations • 9 Dec 2014 • Seungwhan Moon, Suyoun Kim, Haohan Wang

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality.

Video Recognition

Paper
Add Code

Identifying Student Leaders from MOOC Discussion Forums through Language Influence

no code implementations • WS 2014 • Seungwhan Moon, Saloni Potdar, Lara Martin

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.