In the context of neural passage retrieval, we study three promising techniques: synthetic data generation, negative sampling, and fusion.
We evaluate our methods on de-noising parallel texts and training neural machine translation models.
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering.
We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures.
On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs.
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.