The PERSONA-CHAT dataset contains multi-turn dialogues conditioned on personas. The dataset consists of 8939 complete dialogues for training, 1000 for validation, and 968 for testing. Each dialogue was performed between two crowd-source workers assuming artificial personas (described by 3 to 5 profile sentences, such as “I like to ski”, “I am an artist”, “I eat sardines for breakfast daily”). There are 955 possible personas for training, 100 for validation, and 100 for testing. Additionally, a version of revised persona descriptions are also provided by rephrasing, generalizing, or specializing the original ones.

Source: Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots