Rephrase detection is used to identify the rephrases and has long been treated as a task with pairwise input, which does not fully utilize the contextual information (e. g. users’ implicit feedback).
Self-learning paradigms in large-scale conversational AI agents tend to leverage user feedback in bridging between what they say and what they mean.
Additionally, the dependency on a fixed vocabulary limits the subword models' adaptability across languages and domains.
However, these methods rarely focus on query expansion and entity weighting simultaneously, which may limit the scope and accuracy of the query reformulation retrieval.
Individual user profiles and interaction histories play a significant role in providing customized experiences in real-world applications such as chatbots, social media, retail, and education.
Text Style Transfer (TST) aims to alter the underlying style of the source text to another specific style while keeping the same content.
In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with weak supervision data.
Query rewriting (QR) systems are widely used to reduce the friction caused by errors in a spoken language understanding pipeline.
Spoken language understanding (SLU) systems in conversational AI agents often experience errors in the form of misrecognitions by automatic speech recognition (ASR) or semantic gaps in natural language understanding (NLU).
Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant.
Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data.
In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it.