Multilingual NLP
42 papers with code • 0 benchmarks • 8 datasets
Benchmarks
These leaderboards are used to track progress in Multilingual NLP
Libraries
Use these libraries to find Multilingual NLP models and implementationsDatasets
Most implemented papers
Unsupervised Cross-lingual Representation Learning at Scale
We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.
Language-agnostic BERT Sentence Embedding
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
UQA: Corpus for Urdu Question Answering
This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers.
MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing
Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks.
PMIndia -- A Collection of Parallel Corpora of Languages of India
Parallel text is required for building high-quality machine translation (MT) systems, as well as for other multilingual NLP applications.
XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
The introduction of pretrained cross-lingual language models brought decisive improvements to multilingual NLP tasks.
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
ColexNet's nodes are concepts and its edges are colexifications.
What is "Typological Diversity" in NLP?
We recommend future work to include an operationalization of 'typological diversity' that empirically justifies the diversity of language samples.
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Cross-lingual word embeddings are becoming increasingly important in multilingual NLP.