MConv: An Environment for Multimodal Conversational Search across Multiple Domains

Although conversational search has become a hot topic in both dialogue research and IR community, the real breakthrough has been limited by the scale and quality of datasets available. To address this fundamental obstacle, we introduce the Multimodal Multi-domain Conversational dataset (MMConv), a fully annotated collection of human-to-human role-playing dialogues spanning over multiple domains and tasks. The contribution is two-fold. First, beyond the task-oriented multimodal dialogues among user and agent pairs, dialogues are fully annotated with dialogue belief states and dialogue acts. More importantly, we create a relatively comprehensive environment for conducting multimodal conversational search with real user settings, structured venue database, annotated image repository as well as crowd-sourced knowledge database. A detailed description of the data collection procedure along with a summary of data structure and analysis is provided. Second, a set of benchmark results for dialogue state tracking, conversational recommendation, response generation as well as a unified model for multiple tasks are reported. We adopt the state-of-the-art methods for these tasks respectively to demonstrate the usability of the data, discuss limitations of current methods and set baselines for future studies.

PDF Abstract

Datasets


Introduced in the Paper:

MMConv

Used in the Paper:

MultiWOZ VisDial TG-ReDial MMD

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Dialogue State Tracking MMConv DS-DST Categorical Accuracy 91.0 # 2
Non-Categorical Accuracy 23.0 # 2
Overall 18.0 # 2

Methods


No methods listed for this paper. Add relevant methods here