More importantly our fine-tuned CoNLL2003 model displays significant gains in generalization to out of domain datasets: on the OntoNotes subset we achieve an F1 of 72. 67 which is 0. 49 points absolute better than the baseline, and on the WNUT16 set an F1 of 68. 22 which is a gain of 0. 48 points.
Nevertheless, additional pre-training closer to the end-task, such as training on synthetic QA pairs, has been shown to improve performance.
There has been a significant investment in dialog systems (tools and runtime) for building conversational systems by major companies including Google, IBM, Microsoft, and Amazon.
The key idea is to quantize the dialog space into clusters and create a language model across the clusters, thus allowing for an accurate choice of the next utterance in the conversation.
Our CTC word model achieves a word error rate of 13. 0%/18. 8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9. 6%/16. 0% for phone-based CTC with a 4-gram LM.