Enriching Word Vectors with Subword Information

TACL 2017 Piotr Bojanowski • Edouard Grave • Armand Joulin • Tomas Mikolov

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. This is a limitation, especially for languages with large vocabularies and many rare words. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

Full paper


No evaluation results yet. Help compare this paper to other papers by submitting the tasks and evaluation metrics from the paper.