Multilingual Word Embeddings

19 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

francesita/CS-Embed-SemEval2020 SEMEVAL 2020

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

amsuhane/Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages 21 Jul 2021

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting.

Improving Word Translation via Two-Stage Contrastive Learning

cambridgeltl/contrastivebli ACL ARR November 2021

As Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

facebookresearch/vissl 16 Feb 2022

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.

Improving Word Translation via Two-Stage Contrastive Learning

cambridgeltl/contrastivebli ACL 2022

At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking

cambridgeltl/BLICEr 30 Oct 2022

This crucial step is done via 1) creating a word similarity dataset, comprising positive word pairs (i. e., true translations) and hard negative pairs induced from the original CLWE space, and then 2) fine-tuning an mPLM (e. g., mBERT or XLM-R) in a cross-encoder manner to predict the similarity scores.

Language Embeddings Sometimes Contain Typological Generalizations

robertostling/parallel-text-typology 19 Jan 2023

To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned?

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

cisnlp/ofa 15 Nov 2023

Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining.