4 dataset results for Open-Domain Dialog AND Texts AND English

MultiDoc2Dial (MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents)

MultiDoc2Dial is a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. We aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents.

22 PAPERS • NO BENCHMARKS YET

MMDialog

MMDialog is a large-scale multi-turn dialogue dataset containing multi-modal open-domain conversations derived from real human-human chat content in social media. MMDialog contains 1.08M dialogue sessions and 1.53M associated images. On average, one dialogue session has 2.59 images, which can be located anywhere at any conversation turn.

15 PAPERS • 1 BENCHMARK

ProsocialDialog

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.

13 PAPERS • 1 BENCHMARK

Reddit Conversation Corpus

Reddit Conversation Corpus (RCC) consists of conversations, scraped from Reddit, for a 20 month period from November 2016 until August 2018. To ensure the quality and diversity of topics, 95 subreddits are selected from which conversations are collected. In total, RCC contains 9.2 million 3-turn conversations.

6 PAPERS • NO BENCHMARKS YET

Datasets

4 dataset results for Open-Domain Dialog AND Texts AND English