2 code implementations • 18 Apr 2022 • Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, Swetha Ranganath, Laurie Crist, Misha Britan, Wouter Leeuwis, Gokhan Tur, Prem Natarajan
We present the MASSIVE dataset--Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation.
Ranked #1 on Slot Filling on MASSIVE
Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?!
Masked language models have revolutionized natural language processing systems in the past few years.
Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs.
The second component of the research is the construction of a conversational agent model capable of injecting social language into an agent's responses while still preserving content.
Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions.
We show that LRTA makes a step towards truly understanding the question while the state-of-the-art model tends to learn superficial correlations from the training data.
Different flavors of transfer learning have shown tremendous impact in advancing research and applications of machine learning.
The insertion and drop modification of the input text during training of WLM resemble the types of noise due to Automatic Speech Recognition (ASR) errors, and as a result WLMs are likely to be more robust to ASR noise.
This work introduces Focused-Variation Network (FVN), a novel model to control language generation.
It can be shown that SNN outperform the baseline by relative 26. 8 % Equal Error Rate (EER).
no code implementations • 28 Jan 2020 • Yue Weng, Sai Sumanth Miryala, Chandra Khatri, Runze Wang, Huaixiu Zheng, Piero Molino, Mahdi Namazifar, Alexandros Papangelis, Hugh Williams, Franziska Bell, Gokhan Tur
As a baseline approach, we trained task-specific Statistical Language Models (SLM) and fine-tuned state-of-the-art Generalized Pre-training (GPT) Language Model to re-rank the n-best ASR hypotheses, followed by a model to identify the dialog act and slots.
In this work, we propose to use the exploration approach of Go-Explore for solving text-based games.
Plato has been designed to be easy to understand and debug and is agnostic to the underlying learning frameworks that train each component.
It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot.
Our system consists of two major components: intent detection and reply retrieval, which are very different from standard smart reply systems where the task is to directly predict a reply.
and their own objectives, and can only interact via natural language they generate.
We further develop several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses and a novel goal regularization mechanism to penalize divergence of user responses from the initial user goal.
To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback.
We show that deep RL based optimization leads to significant improvement on task success rate and reduction in dialogue length comparing to supervised training model.
While multi-task training of such models alleviates the need for large in-domain annotated datasets, bootstrapping a semantic parsing model for a new domain using only the semantic frame, such as the back-end API or knowledge graph schema, is still one of the holy grail tasks of language understanding for dialogue systems.
We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history.
Natural language understanding (NLU) is a core component of a spoken dialogue system.
We propose a novel zero-shot learning method for semantic utterance classification (SUC).