A Corpus for Multilingual Document Classification in Eight Languages

LREC 2018 Holger SchwenkXian Li

Cross-lingual document classification aims at training a document classifier on resources in one language and transferring it to a different language without any additional resources. Several approaches have been proposed in the literature and the current best practice is to evaluate them on a subset of the Reuters Corpus Volume 2... (read more)

PDF Abstract
Task Dataset Model Metric name Metric value Global rank Compare
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French MultiCCA + CNN Accuracy 72.38% # 4
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French BiLSTM (UN) Accuracy 74.52% # 2
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-French BiLSTM (Europarl) Accuracy 72.83% # 3
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-German MultiCCA + CNN Accuracy 81.20% # 2
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-German BiLSTM (Europarl) Accuracy 71.83% # 3
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish BiLSTM (Europarl) Accuracy 66.65% # 4
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish MultiCCA + CNN Accuracy 72.50% # 2
Cross-Lingual Document Classification MLDoc Zero-Shot English-to-Spanish BiLSTM (UN) Accuracy 69.50% # 3