Cross-Lingual Document Classification
12 papers with code • 10 benchmarks • 2 datasets
Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.
Latest papers with no code
Margin-aware Unsupervised Domain Adaptation for Cross-lingual Text Labeling
Unsupervised domain adaptation addresses the problem of leveraging labeled data in a source domain to learn a well-performing model in a target domain where labels are unavailable.
Wasserstein distances for evaluating cross-lingual embeddings
Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space.
Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
Text classification must sometimes be applied in a low-resource language with no labeled training data.
Variational learning across domains with triplet information
The work investigates deep generative models, which allow us to use training data from one domain to build a model for another domain.
NMT-based Cross-lingual Document Embeddings
This paper further adds a distance constraint to the training objective function of NV so that the two embeddings of a parallel document are required to be as close as possible.
A Multi-task Approach to Learning Multilingual Representations
We present a novel multi-task modeling approach to learning multilingual distributed representations of text.
Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification
In this paper we continue experiments where neural machine translation training is used to produce joint cross-lingual fixed-dimensional sentence embeddings.
Variational learning across domains with triplet information
The work investigates deep generative models, which allow us to use training data from one domain to build a model for another domain.