Continual Pretraining

27 papers with code • 3 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Continual Pretraining models and implementations

Most implemented papers

Continual Training of Language Models for Few-Shot Learning

uic-liu-lab/cpt 11 Oct 2022

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.

Rho-1: Not All Tokens Are What You Need

microsoft/rho 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

pluslabnlp/econet EMNLP 2021

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.

Continual Pre-training of Language Models

UIC-Liu-Lab/ContinualLM 7 Feb 2023

A novel proxy is also proposed to preserve the general knowledge in the original LM.

Towards Geospatial Foundation Models via Continual Pretraining

mmendiet/gfm ICCV 2023

Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.

Effective Long-Context Scaling of Foundation Models

openlmlab/leval 27 Sep 2023

We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Autonomous Data Selection with Language Models for Mathematical Texts

hiyouga/llama-factory 12 Feb 2024

Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

bionlu-coling2024/biomed-ner-intent_detection 31 Jul 2020

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

vano1205/efficientcl EMNLP 2021

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

On the Robustness of Reading Comprehension Models to Entity Renaming

ink-usc/entity-robustness NAACL 2022

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?