Extrapolation in NLP

We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.

Results in Papers With Code
(↓ scroll down to see all results)