|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
Finally, we introduce a new test set of aligned sentences in 122 languages based on the Tatoeba corpus, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.
CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING
We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
In addition, we have observed that the class prior distributions differ significantly between the languages.
We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings.
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP.
To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.