Multilingual NLP

17 papers with code • 0 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Language-agnostic BERT Sentence Embedding

bojone/labse ACL 2022

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.

PMIndia -- A Collection of Parallel Corpora of Languages of India

bhaddow/pmindia-crawler 27 Jan 2020

Parallel text is required for building high-quality machine translation (MT) systems, as well as for other multilingual NLP applications.

XeroAlign: Zero-Shot Cross-lingual Transformer Alignment

huawei-noah/noah-research Findings (ACL) 2021

The introduction of pretrained cross-lingual language models brought decisive improvements to multilingual NLP tasks.

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

yeraidm/meemi EMNLP 2018

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP.

Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation

bheinzerling/subword-sequence-tagging ACL 2019

Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP.

Simultaneous Translation and Paraphrase for Language Education

duolingo/duolingo-sharedtask-2020 WS 2020

We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE).

fugashi, a Tool for Tokenizing Japanese in Python

polm/fugashi EMNLP (NLPOSS) 2020

Recent years have seen an increase in the number of large-scale multilingual NLP projects.

Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis

om304/multi-spa-verb COLING 2020

We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics.

SICKNL: A Dataset for Dutch Natural Language Inference

gijswijnholds/sick_nl 14 Jan 2021

We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch.