20 papers with code • 1 benchmarks • 2 datasets
Large end-to-end neural open-domain chatbots are becoming increasingly popular.
ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks.
Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text.
Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.
To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.
The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research.
The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation.
Open-domain human-computer conversation has been attracting increasing attention over the past few years.