The PERSONA-CHAT dataset contains multi-turn dialogues conditioned on personas. The dataset consists of 8939 complete dialogues for training, 1000 for validation, and 968 for testing. Each dialogue was performed between two crowd-source workers assuming artificial personas (described by 3 to 5 profile sentences, such as “I like to ski”, “I am an artist”, “I eat sardines for breakfast daily”). There are 955 possible personas for training, 100 for validation, and 100 for testing. Additionally, a version of revised persona descriptions are also provided by rephrasing, generalizing, or specializing the original ones.
186 PAPERS • 1 BENCHMARK
To analyze how these capabilities would mesh together in a natural conversation, and compare the performance of different architectures and training schemes.
10 PAPERS • 1 BENCHMARK
A large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
5 PAPERS • NO BENCHMARKS YET
Taiga is a corpus, where text sources and their meta-information are collected according to popular ML tasks.
4 PAPERS • NO BENCHMARKS YET
The Metaphorical Connections dataset is a poetry dataset that contains annotations between metaphorical prompts and short poems. Each poem is annotated whether or not it successfully communicates the idea of the metaphorical prompt.
2 PAPERS • NO BENCHMARKS YET