Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Task Analysis

24 May 2018  ·  Kelly W. Zhang, Samuel R. Bowman ·

There is mounting evidence that pretraining can be valuable for neural network language understanding models, but we do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn. With this in mind, we compare four objectives---language modeling, translation, skip-thought, and autoencoding---on their ability to induce syntactic and part-of-speech information, holding constant the genre and quantity of training data. We find that representations from language models consistently perform best on our syntactic auxiliary prediction tasks, even when trained on relatively small amounts of data, which suggests that language modeling may be the best data-rich pretraining task for transfer learning applications requiring syntactic information. We also find that a randomly-initialized, frozen model can perform strikingly well on our auxiliary tasks, but that this effect disappears when the amount of training data for the auxiliary tasks is reduced.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here