1 code implementation • EMNLP 2021 • Nikolay Arefyev, Dmitrii Kharchev, Artem Shelmanov
While Masked Language Models (MLM) are pre-trained on massive datasets, the additional training with the MLM objective on domain or task-specific data before fine-tuning for the final task is known to improve the final performance.