Continual Pretraining

7 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

Continual Training of Language Models for Few-Shot Learning

uic-liu-lab/cpt 11 Oct 2022

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

pluslabnlp/econet EMNLP 2021

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

vano1205/efficientcl EMNLP 2021

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

On the Robustness of Reading Comprehension Models to Entity Renaming

ink-usc/entity-robustness NAACL 2022

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?

Fortunately, Discourse Markers Can Enhance Language Models for Sentiment Analysis

ibm/tslm-discourse-markers 6 Jan 2022

In recent years, pretrained language models have revolutionized the NLP world, while achieving state of the art performance in various downstream tasks.

Hierarchical Label-wise Attention Transformer Model for Explainable ICD Coding

leiboliu/hilat 22 Apr 2022

In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents.

Continual Pre-Training Mitigates Forgetting in Language and Vision

andreacossu/continual-pretraining-nlp-vision 19 May 2022

We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks.