A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare documents: (1) multilingual topic model; (2) cross-lingual document embeddings; and (3) Wasserstein distance.We test the performance of these methods in retrieving news articles in Swedish that are known to be related to a given Finnish article.The results show that ensembles of the methods outperform the stand-alone methods, suggesting that they capture complementary characteristics of the documents

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here