Cross-Lingual Document Classification

12 papers with code • 10 benchmarks • 2 datasets

Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.

Benchmarks

Add a Result

These leaderboards are used to track progress in Cross-Lingual Document Classification

Dataset	Best Model	Compare
MLDoc Zero-Shot English-to-Spanish	XLMft UDA	See all
MLDoc Zero-Shot English-to-French	XLMft UDA	See all
MLDoc Zero-Shot English-to-German	XLMft UDA	See all
MLDoc Zero-Shot English-to-Chinese	XLMft UDA	See all
MLDoc Zero-Shot English-to-Russian	XLMft UDA	See all
MLDoc Zero-Shot English-to-Italian	MultiFiT, pseudo	See all
MLDoc Zero-Shot English-to-Japanese	MultiFiT, pseudo	See all
Reuters RCV1/RCV2 English-to-German	Biinclusion (Euro500kReuters)	See all
Reuters RCV1/RCV2 German-to-English	Biinclusion (Euro500kReuters)	See all
MLDoc Zero-Shot German-to-French	BiLSTM (Europarl)	See all

Datasets

RCV1
MLDoc

Subtasks

News Classification

Latest papers

Most implemented Social Latest No code

Multilingual and cross-lingual document classification: A meta-learning approach

mrvoh/meta_learning_multilingual_doc_classification • • EACL 2021

The great majority of languages in the world are considered under-resourced for the successful application of deep learning methods.

27 Jan 2021

Paper
Code

Robust Cross-lingual Embeddings from Parallel Sentences

epfml/sent2vec • 28 Dec 2019

Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.

1,186

28 Dec 2019

Paper
Code

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

labmlai/annotated_deep_learning_paper_implementations • • 4 Oct 2019

Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging.

47,189

04 Oct 2019

Paper
Code

Bridging the domain gap in cross-lingual document classification

laiguokun/xlu-data • 16 Sep 2019

We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language.

16 Sep 2019

Paper
Code

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

n-waves/multifit • IJCNLP 2019

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data.

282

10 Sep 2019

Paper
Code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER • • TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

3,515

26 Dec 2018

Paper
Code

A Corpus for Multilingual Document Classification in Eight Languages

n-waves/multifit • LREC 2018

In addition, we have observed that the class prior distributions differ significantly between the languages.

282

24 May 2018

Paper
Code

Learning Crosslingual Word Embeddings without Bilingual Corpora

longdt219/xlingualemb • EMNLP 2016

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools.

30 Jun 2016

Paper
Code

Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification

ccsasuke/adan • • TACL 2018

To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.

06 Jun 2016

Paper
Code

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

eske/multivec • 9 Oct 2014

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

116

09 Oct 2014

Paper
Code

Cross-Lingual Document Classification

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result