A Corpus for Multilingual Document Classification in Eight Languages

LREC 2018 Holger SchwenkXian Li

Cross-lingual document classification aims at training a document classifier on resources in one language and transferring it to a different language without any additional resources. Several approaches have been proposed in the literature and the current best practice is to evaluate them on a subset of the Reuters Corpus Volume 2... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French MultiCCA + CNN Accuracy 72.38% # 6
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French BiLSTM (UN) Accuracy 74.52% # 4
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French BiLSTM (Europarl) Accuracy 72.83% # 5
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-German MultiCCA + CNN Accuracy 81.20% # 4
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-German BiLSTM (Europarl) Accuracy 71.83% # 5
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish BiLSTM (Europarl) Accuracy 66.65% # 6
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish MultiCCA + CNN Accuracy 72.50% # 4
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish BiLSTM (UN) Accuracy 69.50% # 5