This dataset is for evaluating the task of Black-box Multi-agent Integration which focuses on combining the capabilities of multiple black-box conversational agents at scale. It provides data to explore two main frameworks of exploration: question agent pairing and question response pairing.
Overall this dataset contains 5550 utterances with 19 question-response pairs per question (one from each of the 19 agents), 105,450 in total across 37 domains. The utterances are split into 3700 utterances (100 examples per domain) for the training set and 1850 (50 per domain) for the test set. The train and test sets respectively contain 2399 and 1186 utterances with at least one positive question-response pair. In the remaining examples, none of the agents were able to achieve annotator agreement (>= 3).