In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots.
Documents as short as a single sentence may inadvertently reveal sensitive information about their authors, including e. g. their gender or ethnicity.
We propose a shared task on training instance selection for few-shot neural text generation.
In this work, we present a study on training instance selection in few-shot neural text generation.
We believe that learning fine-grained correspondence between each single fact and law articles is crucial for an accurate and trustworthy AI system.
Our approach automatically augments the data available for training by (i) generating new text samples based on replacing specific values by alternative ones from the same category, (ii) generating new text samples based on GPT-2, and (iii) proposing an automatic method for pairing the new text samples with data samples.
Utterance classification is a key component in many conversational systems.
We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general task of labeling structured data with textual descriptions.
Particularly, these image features are subdivided into global and local features, where global features are extracted from the global representation of the image, while local features are extracted from the objects detected locally in an image.
Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-diversity problem when it comes to open-domain dialogue generation.
The neural attention model has achieved great success in data-to-text generation tasks.
In this work, we develop techniques targeted at bridging the gap between Pidgin English and English in the context of natural language generation.
First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression.
To properly train the utterance rewriter, we collect a new dataset with human annotations and introduce a Transformer-based utterance rewriting architecture using the pointer network.
Sequence-to-Sequence (seq2seq) models have become overwhelmingly popular in building end-to-end trainable dialogue systems.
Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems.