Cross-Lingual Document Classification

12 papers with code • 10 benchmarks • 2 datasets

Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.

Benchmarks

Add a Result

These leaderboards are used to track progress in Cross-Lingual Document Classification

Dataset	Best Model	Compare
MLDoc Zero-Shot English-to-Spanish	XLMft UDA	See all
MLDoc Zero-Shot English-to-French	XLMft UDA	See all
MLDoc Zero-Shot English-to-German	XLMft UDA	See all
MLDoc Zero-Shot English-to-Chinese	XLMft UDA	See all
MLDoc Zero-Shot English-to-Russian	XLMft UDA	See all
MLDoc Zero-Shot English-to-Italian	MultiFiT, pseudo	See all
MLDoc Zero-Shot English-to-Japanese	MultiFiT, pseudo	See all
Reuters RCV1/RCV2 English-to-German	Biinclusion (Euro500kReuters)	See all
Reuters RCV1/RCV2 German-to-English	Biinclusion (Euro500kReuters)	See all
MLDoc Zero-Shot German-to-French	BiLSTM (Europarl)	See all

Datasets

RCV1
MLDoc

Subtasks

News Classification

Most implemented papers

Most implemented Social Latest No code

Bridging the domain gap in cross-lingual document classification

laiguokun/xlu-data • 16 Sep 2019

We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language.

Paper
Code