Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.
20 PAPERS • 1 BENCHMARK
DuLeMon is a large-scale Chinese Long-term Memory Conversation dataset, which simulates long-term memory conversations and focuses on the ability to actively construct and utilize the user's and the bot's persona in a long-term interaction. DuLeMon contains about 27.5k human-human conversations, 449k utterances, and 12k persona grounding sentences. This corpus can be used to explore Long-term Memory Conversation, Personalized Dialogue, and Persona Extraction / Matching / Retrieval.
11 PAPERS • NO BENCHMARKS YET
The Composed Quora dataset consists of questions extracted from Quora that are grouped together if they are asking the same thing. The dataset contains 60,400 groups of questions, each group with at least 3 questions that are asking the same.
1 PAPER • NO BENCHMARKS YET
PSM is a financial-domain dataset of the pairwise search matching task. It aims to identify the semantic similarity of a sentence pair in the search scenario.