When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

Further pre-training language models on in-domain data (domain-adaptive pre-training, DAPT) or task-relevant data (task-adaptive pre-training, TAPT) before fine-tuning has been shown to improve downstream tasks’ performances. However, in task-oriented dialog modeling, we observe that further pre-training MLM does not always boost the performance on a downstream task. We find that DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of DAPT data does not help. Through Representational Similarity Analysis, we conclude that more data for fine-tuning yields greater change of the model’s representations and thus reduces the influence of initialization.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here