The AlpacaEval set contains 805 instructions form self-instruct, open-assistant, vicuna, koala, hh-rlhf. Those were selected so that the AlpacaEval ranking of models on the AlpacaEval set would be similar to the ranking on the Alpaca demo data.
70 PAPERS • 1 BENCHMARK
A large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
10 PAPERS • NO BENCHMARKS YET
Dataset Description Our dataset contains questions from a well-known software testing book Introduction to Software Testing 2nd Edition by Ammann and Offutt. We use all the text-book questions in Chapters 1 to 5 that have solutions available on the book’s official website.
0 PAPER • NO BENCHMARKS YET
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.