Finally, we introduce a new test set of aligned sentences in 122 languages based on the Tatoeba corpus, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.
CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING
Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space.
Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually.
We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20% in correlation when predicting across languages.