Textual Representations for Crosslingual Information Retrieval

ACL (ECNLP) 2021  ·  Hang Zhang, Liling Tan ·

In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation.Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here