Extrapolation in NLP
We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.PDF Abstract WS 2018 PDF WS 2018 Abstract
No code implementations yet. Submit your code now
Results from the Paper
Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.
No methods listed for this paper. Add relevant methods here