2 code implementations • ACL 2022 • Huihan Li, Tianyu Gao, Manan Goenka, Danqi Chen
In this work, we conduct the first large-scale human evaluation of state-of-the-art conversational QA systems, where human evaluators converse with models and judge the correctness of their answers.