Cross-Lingual Transfer
278 papers with code • 1 benchmarks • 16 datasets
Cross-lingual transfer refers to transfer learning using data and models available for one language for which ample such resources are available (e.g., English) to solve tasks in another, commonly more low-resource, language.
Libraries
Use these libraries to find Cross-Lingual Transfer models and implementationsMost implemented papers
Unsupervised Cross-lingual Representation Learning at Scale
We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.
Unsupervised Dense Information Retrieval with Contrastive Learning
In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages
This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography.
Pushing the Limits of Low-Resource Morphological Inflection
Recent years have seen exceptional strides in the task of automatic morphological inflection generation.
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages.
Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja
We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i. e. Hanja).
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU
We introduce MultiATIS++, a new multilingual NLU corpus that extends the Multilingual ATIS corpus to nine languages across four language families, and evaluate our method using the corpus.