Multilingual NLP
34 papers with code • 0 benchmarks • 4 datasets
Benchmarks
These leaderboards are used to track progress in Multilingual NLP
Libraries
Use these libraries to find Multilingual NLP models and implementationsMost implemented papers
fugashi, a Tool for Tokenizing Japanese in Python
Recent years have seen an increase in the number of large-scale multilingual NLP projects.
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics.
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
Finally, we create a demo video for Trankit at: https://youtu. be/q0KGP3zGjGc.
SICKNL: A Dataset for Dutch Natural Language Inference
We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch.
SICK-NL: A Dataset for Dutch Natural Language Inference
We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch.
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.
Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer
As a result, one should not expect that for a target language $L_1$ there is a single language $L_2$ that is the best choice for any NLP task (for instance, for Bulgarian, the best source language is French on POS tagging, Russian on NER and Thai on NLI).
Cultural and Geographical Influences on Image Translatability of Words across Languages
We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i. e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity.
HONEST: Measuring Hurtful Sentence Completion in Language Models
Our results show that 4. 3{\%} of the time, language models complete a sentence with a hurtful word.
Improving Word Translation via Two-Stage Contrastive Learning
As Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.