Continual Pretraining
27 papers with code • 3 benchmarks • 3 datasets
Libraries
Use these libraries to find Continual Pretraining models and implementationsMost implemented papers
Continual Training of Language Models for Few-Shot Learning
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.
Rho-1: Not All Tokens Are What You Need
After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.
ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning
While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.
Continual Pre-training of Language Models
A novel proxy is also proposed to preserve the general knowledge in the original LM.
Towards Geospatial Foundation Models via Continual Pretraining
Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.
Effective Long-Context Scaling of Foundation Models
We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
Autonomous Data Selection with Language Models for Mathematical Texts
Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.
On the Robustness of Reading Comprehension Models to Entity Renaming
We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?