2 dataset results for Text Matching AND Texts AND Chinese

PSM is a financial-domain dataset of the pairwise search matching task. It aims to identify the semantic similarity of a sentence pair in the search scenario.

1 PAPER • NO BENCHMARKS YET

DuLeMon (Baidu Long-term Memory Conversation)

DuLeMon is a large-scale Chinese Long-term Memory Conversation dataset, which simulates long-term memory conversations and focuses on the ability to actively construct and utilize the user's and the bot's persona in a long-term interaction. DuLeMon contains about 27.5k human-human conversations, 449k utterances, and 12k persona grounding sentences. This corpus can be used to explore Long-term Memory Conversation, Personalized Dialogue, and Persona Extraction / Matching / Retrieval.

11 PAPERS • NO BENCHMARKS YET

Datasets

2 dataset results for Text Matching AND Texts AND Chinese