To our knowledge, only one method has been proposed to learn a sequence of mixed tasks.
Although several techniques have achieved learning with no CF, they attain it by letting each task monopolize a sub-network in a shared network, which seriously limits knowledge transfer (KT) and causes over-consumption of the network capacity, i. e., as more tasks are learned, the performance deteriorates.
The key theoretical result is that regardless of whether WP and OOD detection (or TP) are defined explicitly or implicitly by a CIL algorithm, good WP and good OOD detection are necessary and sufficient conditions for good CIL, which unifies novelty or OOD detection and continual learning (CIL, in particular).
A novel proxy is also proposed to preserve the general knowledge in the original LM.
Ranked #1 on Continual Pretraining on SciERC
This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge.
Continual learning (CL) is a learning paradigm that emulates the human capability of learning and accumulating knowledge continually without forgetting the previously learned knowledge and also transferring the learned knowledge to help learn new tasks better.
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.
Ranked #1 on Continual Pretraining on AG News
Instead of using the saved samples in memory to update the network for previous tasks/classes in the existing approach, MORE leverages the saved samples to build a task specific classifier (adding a new classification head) without updating the network learned for previous tasks/classes.
In many real-world machine learning applications, samples belong to a set of domains e. g., for product reviews each review belongs to a product category.
To the best of our knowledge, no technique has been proposed to learn a sequence of mixed similar and dissimilar tasks that can deal with forgetting and also transfer knowledge forward and backward.
Ranked #1 on Continual Learning on F-CelebA (10 tasks)
In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain.
Ranked #4 on Continual Learning on DSC (10 tasks)
This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks.
Ranked #3 on Continual Learning on ASC (19 tasks)
The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing.
Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge.
Ranked #1 on Continual Learning on DSC (10 tasks)
The goal of this work is to endow such systems with the additional ability to transfer knowledge among tasks when the tasks are similar and have shared knowledge to achieve higher accuracy.
PLS is not only simple and efficient but also does not invade data privacy due to the fact that it works in the latent feature space.
While the vast majority of existing work on automated essay scoring has focused on holistic scoring, researchers have recently begun work on scoring specific dimensions of essay quality.
While argument persuasiveness is one of the most important dimensions of argumentative essay quality, it is relatively little studied in automated essay scoring research.