Multilingual NLP
17 papers with code • 0 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in Multilingual NLP
Most implemented papers
Language-agnostic BERT Sentence Embedding
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
PMIndia -- A Collection of Parallel Corpora of Languages of India
Parallel text is required for building high-quality machine translation (MT) systems, as well as for other multilingual NLP applications.
XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
The introduction of pretrained cross-lingual language models brought decisive improvements to multilingual NLP tasks.
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Cross-lingual word embeddings are becoming increasingly important in multilingual NLP.
Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP.
Simultaneous Translation and Paraphrase for Language Education
We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE).
fugashi, a Tool for Tokenizing Japanese in Python
Recent years have seen an increase in the number of large-scale multilingual NLP projects.
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics.
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
Finally, we create a demo video for Trankit at: https://youtu. be/q0KGP3zGjGc.
SICKNL: A Dataset for Dutch Natural Language Inference
We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch.