Continual Pretraining

18 papers with code • 3 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Continual Pretraining models and implementations

Most implemented papers

Continual Training of Language Models for Few-Shot Learning

uic-liu-lab/cpt 11 Oct 2022

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

pluslabnlp/econet EMNLP 2021

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.

Continual Pre-training of Language Models

UIC-Liu-Lab/ContinualLM 7 Feb 2023

A novel proxy is also proposed to preserve the general knowledge in the original LM.

AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts

hiyouga/llama-factory 12 Feb 2024

To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection.

Data Engineering for Scaling Language Models to 128K Context

franxyao/long-context-data-engineering 15 Feb 2024

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

vano1205/efficientcl EMNLP 2021

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

On the Robustness of Reading Comprehension Models to Entity Renaming

ink-usc/entity-robustness NAACL 2022

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?

Fortunately, Discourse Markers Can Enhance Language Models for Sentiment Analysis

ibm/tslm-discourse-markers 6 Jan 2022

In recent years, pretrained language models have revolutionized the NLP world, while achieving state of the art performance in various downstream tasks.

Hierarchical Label-wise Attention Transformer Model for Explainable ICD Coding

leiboliu/hilat 22 Apr 2022

In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents.

Continual Pre-Training Mitigates Forgetting in Language and Vision

andreacossu/continual-pretraining-nlp-vision 19 May 2022

We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks.