Multilingual Word Embeddings

19 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

cisnlp/ofa 15 Nov 2023

Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining.

11
15 Nov 2023

Language Embeddings Sometimes Contain Typological Generalizations

robertostling/parallel-text-typology 19 Jan 2023

To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned?

4
19 Jan 2023

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking

cambridgeltl/BLICEr 30 Oct 2022

This crucial step is done via 1) creating a word similarity dataset, comprising positive word pairs (i. e., true translations) and hard negative pairs induced from the original CLWE space, and then 2) fine-tuning an mPLM (e. g., mBERT or XLM-R) in a cross-encoder manner to predict the similarity scores.

13
30 Oct 2022

Improving Word Translation via Two-Stage Contrastive Learning

cambridgeltl/contrastivebli ACL 2022

At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.

32
15 Mar 2022

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

facebookresearch/vissl 16 Feb 2022

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.

3,227
16 Feb 2022

Improving Word Translation via Two-Stage Contrastive Learning

cambridgeltl/contrastivebli ACL ARR November 2021

As Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.

32
16 Nov 2021

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

amsuhane/Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages 21 Jul 2021

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting.

2
21 Jul 2021

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

francesita/CS-Embed-SemEval2020 SEMEVAL 2020

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.

3
08 Jun 2020

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

cisnlp/simalign Findings of the Association for Computational Linguistics 2020

We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e. g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

339
18 Apr 2020

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

alirezamshi/AME-CMR EMNLP (WS) 2019

In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages.

8
08 Oct 2019