An emerging theoretically justified and effective approach is to train a task-specific model for each task in a shared network for all tasks based on a task-incremental learning (TIL) method to deal with forgetting.
Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF).
We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning.
A novel proxy is also proposed to preserve the general knowledge in the original LM.
Ranked #1 on Continual Pretraining on SciERC
This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge.
But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e. g., breaking numbers into sub-word tokens - which leads to many number-related errors.
To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.
Ranked #1 on Continual Pretraining on AG News
Tabular data analysis is performed every day across various domains.
Recently, contrastive loss with data augmentation and pseudo class creation has been shown to produce markedly better results for out-of-distribution (OOD) detection than previous methods.