5 dataset results for Conversational Response Generation

CPED (Chinese Personalized and Emotional Dialogue)

We construct a dataset named CPED from 40 Chinese TV shows. CPED consists of multisource knowledge related to empathy and personal characteristic. This knowledge covers 13 emotions, gender, Big Five personality traits, 19 dialogue acts and other knowledge.

15 PAPERS • 3 BENCHMARKS

DuLeMon (Baidu Long-term Memory Conversation)

DuLeMon is a large-scale Chinese Long-term Memory Conversation dataset, which simulates long-term memory conversations and focuses on the ability to actively construct and utilize the user's and the bot's persona in a long-term interaction. DuLeMon contains about 27.5k human-human conversations, 449k utterances, and 12k persona grounding sentences. This corpus can be used to explore Long-term Memory Conversation, Personalized Dialogue, and Persona Extraction / Matching / Retrieval.

12 PAPERS • NO BENCHMARKS YET

Alpaca Data Galician

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

Arabic-ToD

Arabic-ToD (Arabic-ToD: Arabic Task Oriented Dialogue dataset)

The Arabic-TOD dataset is based on the BiToD dataset. Of the 3,689 BiToD-English dialogues, 1,500 dialogues (30,000 utterances) were translated into Arabic. We translated the task-related keywords such as cuisine, dietary restrictions, and price-level for the restaurant domain, price-level for the hotel domain, type, and price-level for the attraction domain, day, weather, and city for the weather domain. We keep the rest of values without translation, like hotels’ and restaurants’ names, locations, and addresses. These values are real entities in Hong Kong city (literals), and most of them contain Chinese words written in English, therefore they have not been translated. According to the slot-values in the Arabic-TOD dataset, we used the slots names as they are in English and translated their corresponding values, except the entities in Hong Kong city since the Arabic-TOD dataset supports codeswitching.

1 PAPER • NO BENCHMARKS YET

ChatGPT Role-Play Dataset (CRD)

Dataset Overview vanilla.csv: Represents the interactions without specific role-play instructions. boss.csv: Interactions where ChatGPT plays the role of a user's boss. classmate.csv: Interactions with ChatGPT acting as the user's classmate. Each turn was coded with user motives of user responses, or the perceived naturalness of ChatGPT responses.

1 PAPER • NO BENCHMARKS YET

Datasets

5 dataset results for Conversational Response Generation