Data-driven Natural Language Generation: Paving the Road to Success

28 Jun 2017 · Jekaterina Novikova, Ondřej Dušek, Verena Rieser ·

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training.

PDF Abstract