MMDialog is a large-scale multi-turn dialogue dataset containing multi-modal open-domain conversations derived from real human-human chat content in social media. MMDialog contains 1.08M dialogue sessions and 1.53M associated images. On average, one dialogue session has 2.59 images, which can be located anywhere at any conversation turn.
15 PAPERS • 1 BENCHMARK
OpenViDial 2.0 is a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context.
1 PAPER • 1 BENCHMARK