Current approaches therefore typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders: they fine-tune all the parameters of a pretrained massively multilingual Transformer (MMT, e. g., multilingual BERT) on English relevance judgments and then deploy it in the target language.
In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs.
Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs.
The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture.
Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level".
In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings.
We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.