4 dataset results for Text Matching AND Texts

Snopes

Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.

20 PAPERS • 1 BENCHMARK

DuLeMon (Baidu Long-term Memory Conversation)

DuLeMon is a large-scale Chinese Long-term Memory Conversation dataset, which simulates long-term memory conversations and focuses on the ability to actively construct and utilize the user's and the bot's persona in a long-term interaction. DuLeMon contains about 27.5k human-human conversations, 449k utterances, and 12k persona grounding sentences. This corpus can be used to explore Long-term Memory Conversation, Personalized Dialogue, and Persona Extraction / Matching / Retrieval.

11 PAPERS • NO BENCHMARKS YET

Composed Quora

The Composed Quora dataset consists of questions extracted from Quora that are grouped together if they are asking the same thing. The dataset contains 60,400 groups of questions, each group with at least 3 questions that are asking the same.

1 PAPER • NO BENCHMARKS YET

PSM

PSM is a financial-domain dataset of the pairwise search matching task. It aims to identify the semantic similarity of a sentence pair in the search scenario.

1 PAPER • NO BENCHMARKS YET

Datasets

4 dataset results for Text Matching AND Texts