MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

LREC 2016 · Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier ·

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec{'}s features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.