Browse > Natural Language Processing > Cross-Lingual > Cross-Lingual Document Classification

Cross-Lingual Document Classification

5 papers with code · Natural Language Processing
Subtask of Cross-Lingual

State-of-the-art leaderboards

Greatest papers with code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

26 Dec 2018facebookresearch/LASER

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different language families and written in 28 different scripts. Finally, we introduce a new test set of aligned sentences in 122 languages based on the Tatoeba corpus, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.

CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

9 Oct 2014eske/multivec

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data.

CROSS-LINGUAL DOCUMENT CLASSIFICATION DOCUMENT CLASSIFICATION

A Corpus for Multilingual Document Classification in Eight Languages

LREC 2018 facebookresearch/MLDoc

In addition, we have observed that the class prior distributions differ significantly between the languages. Our goal is to offer a freely available framework to evaluate cross-lingual document classification, and we hope to foster by these means, research in this important area.

CROSS-LINGUAL DOCUMENT CLASSIFICATION DOCUMENT CLASSIFICATION SENTENCE EMBEDDINGS

Multilingual Models for Compositional Distributed Semantics

ACL 2014 karlmoritz/bicvm

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.

CROSS-LINGUAL DOCUMENT CLASSIFICATION DOCUMENT CLASSIFICATION LEARNING SEMANTIC REPRESENTATIONS

Multilingual Distributed Representations without Word Alignment

20 Dec 2013karlmoritz/bicvm

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks.

CROSS-LINGUAL DOCUMENT CLASSIFICATION DOCUMENT CLASSIFICATION SENTIMENT ANALYSIS