We demonstrate that in contradiction to the previous observation in the case of monolingual distillation, in multilingual settings, distillation during pretraining is more effective than distillation during fine-tuning for zero-shot transfer learning.
Use of synthetic data is rapidly emerging as a realistic alternative to manually annotating live traffic for industry-scale model building.
However, given the trend of larger pre-trained models, fine-tuning these models for each downstream task is parameter-inefficient and computationally-expensive deeming this approach sub-optimal for adoption by NLU systems.
(2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model.
Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content.
In this work, we present an architecture to perform such domain adaptation automatically, with only a small amount of metadata about the new domain and without any new training data (zero-shot) or with very few examples (few-shot).
A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data.
We present LINGUIST, a method for generating annotated data for Intent Classification and Slot Tagging (IC+ST), via fine-tuning AlexaTM 5B, a 5-billion-parameter multilingual sequence-to-sequence (seq2seq) model, on a flexible instruction prompt.
1 code implementation • 2 Aug 2022 • Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan
In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.
Ranked #8 on Natural Language Inference on CommitmentBank
no code implementations • 15 Jun 2022 • Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tur, Wael Hamza, Jonathan Hueser, Kevin Martin Jose, Haidar Khan, Beiye Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak, Gokmen Oz, Enrico Palumbo, Charith Peris, Chandana Satya Prakash, Stephen Rawls, Andy Rosenbaum, Anjali Shenoy, Saleh Soltan, Mukund Harakere Sridhar, Liz Tan, Fabian Triefenbach, Pan Wei, Haiyang Yu, Shuai Zheng, Gokhan Tur, Prem Natarajan
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9. 3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
Semantic parsing is an important NLP problem, particularly for voice assistants such as Alexa and Google Assistant.
Understanding human language often necessitates understanding entities and their place in a taxonomy of knowledge -- their types.
In commercial dialogue systems, the Spoken Language Understanding (SLU) component tends to have numerous domains thus context is needed to help resolve ambiguities.
Dialog State Tracking (DST), an integral part of modern dialog systems, aims to track user preferences and constraints (slots) in task-oriented dialogs.
Since our model can process both speech and text input sequences and learn to predict a target sequence, it also allows us to do zero-shot E2E SLU by training on only text-hypothesis data (without any speech) from a new domain.
Ranked #3 on Spoken Language Understanding on Snips-SmartLights
The comparison shows that our model could recover the transcription by integrating the fragmented information among hypotheses and identifying the frequent error patterns of the ASR module, and even rewrite the query for a better understanding, which reveals the characteristic of multi-task learning of broadcasting knowledge.
The light encoder architecture separates the shared pre-trained networks from the mappings of generally encoded knowledge to specific domains of SLU, allowing for the domain adaptation to be performed solely at the light encoder and thus increasing efficiency.
For complex parsing tasks, the state-of-the-art method is based on autoregressive sequence to sequence models to generate the parse directly.
Virtual assistants such as Amazon Alexa, Apple Siri, and Google Assistant often rely on a semantic parsing component to understand which action(s) to execute for an utterance spoken by its users.
The NLU module usually uses the first best interpretation of a given speech in downstream tasks such as domain and intent classification.
We propose an entity-centric neural cross-lingual coreference model that builds on multi-lingual embeddings and language-independent features.
The task of natural question generation is to generate a corresponding question given the input passage (fact) and answer.
Ranked #11 on Question Generation on SQuAD1.1
A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts.
Ranked #3 on Entity Disambiguation on TAC2010
In the QG task, a question is generated from the system given the passage and the target answer, whereas in the QA task, the answer is generated given the question and the passage.
However, it lacks the capacity of utilizing instance-level information from individual instances in the training set.
This paper describes an application of reinforcement learning to the mention detection task.
Natural language sentence matching is a fundamental technology for a variety of tasks.
Ranked #26 on Natural Language Inference on SNLI
Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an end-to-end system that directly predicts the answer beginning and ending points in a passage.
Ranked #44 on Question Answering on SQuAD1.1 dev