Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

20 Feb 2021 Haoming Jiang Bo Dai Mengjiao Yang Tuo Zhao Wei Wei

Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments... (read more)

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet