no code implementations • EMNLP 2021 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In the context of neural passage retrieval, we study three promising techniques: synthetic data generation, negative sampling, and fusion.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Gustavo Hernandez Abrego, Bowen Liang, Wei Wang, Zarana Parekh, Yinfei Yang, YunHsuan Sung
We evaluate our methods on de-noising parallel texts and training neural machine translation models.
no code implementations • 23 Oct 2020 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering.
no code implementations • ACL 2020 • Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures.
no code implementations • 22 Feb 2019 • Yinfei Yang, Gustavo Hernandez Abrego, Steve Yuan, Mandy Guo, Qinlan Shen, Daniel Cer, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs.
no code implementations • WS 2018 • Mandy Guo, Qinlan Shen, Yinfei Yang, Heming Ge, Daniel Cer, Gustavo Hernandez Abrego, Keith Stevens, Noah Constant, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.