Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function.
Experimental results using chitchat data reveal that (1) near human-like dialogue policies can be induced, (2) generalisation to unseen data is a difficult problem, and (3) training an ensemble of chatbot agents is essential for improved performance over using a single agent.
The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way.
The main goal of this paper is to develop out-of-domain (OOD) detection for dialog systems.
Then we used domain-category analysis as an auxiliary task to train neural sentence embedding for OOD sentence detection.