|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.
CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING
Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.
Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data.
In addition, we have observed that the class prior distributions differ significantly between the languages.
We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings.
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP.
To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.
We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language.