no code implementations • EMNLP 2021 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In the context of neural passage retrieval, we study three promising techniques: synthetic data generation, negative sampling, and fusion.
1 code implementation • EACL (AdaptNLP) 2021 • Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant
Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This dataset paper presents MultiReQA, a new multi-domain ReQA evaluation suite composed of eight retrieval QA tasks drawn from publicly available QA datasets.
no code implementations • Findings (EMNLP) 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
no code implementations • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
1 code implementation • 15 Dec 2021 • Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models.
Ranked #1 on
Text Summarization
on BigPatent
no code implementations • 15 Dec 2021 • Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang
With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization.
no code implementations • 10 Sep 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
Ranked #1 on
Semantic Image Similarity
on CxC
1 code implementation • EMNLP 2021 • ZiYi Yang, Yinfei Yang, Daniel Cer, Eric Darve
A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data.
1 code implementation • Findings (ACL) 2022 • Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang
To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
2 code implementations • 11 Feb 2021 • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig
In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.
Ranked #1 on
Image Classification
on VTAB-1k
(using extra training data)
1 code implementation • CVPR 2021 • Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.
no code implementations • 1 Jan 2021 • ZiYi Yang, Yinfei Yang, Daniel M Cer, Jax Law, Eric Darve
This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.
no code implementations • EMNLP 2021 • ZiYi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve
This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Gustavo Hernandez Abrego, Bowen Liang, Wei Wang, Zarana Parekh, Yinfei Yang, YunHsuan Sung
We evaluate our methods on de-noising parallel texts and training neural machine translation models.
no code implementations • 7 Nov 2020 • Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.
no code implementations • 23 Oct 2020 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering.
no code implementations • ACL 2021 • Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer
Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers.
5 code implementations • ACL 2022 • Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
1 code implementation • 13 May 2020 • Forrest Sheng Bao, Hebi Li, Ge Luo, Minghui Qiu, Yinfei Yang, Youbiao He, Cen Chen
Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain.
1 code implementation • 5 May 2020 • Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant
Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This paper presents MultiReQA, anew multi-domain ReQA evaluation suite com-posed of eight retrieval QA tasks drawn from publicly available QA datasets.
1 code implementation • EACL 2021 • Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, Yinfei Yang
By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning.
no code implementations • EACL 2021 • Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, Ryan Mcdonald
The question generation system is trained on general domain data, but is applied to documents in the targeted domain.
1 code implementation • EMNLP 2020 • Uma Roy, Noah Constant, Rami Al-Rfou, Aditya Barua, Aaron Phillips, Yinfei Yang
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool.
no code implementations • CL (ACL) 2021 • Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova
We examine these questions by contrasting the performance of several variants of LSTM-CRF architectures for named entity recognition, with some provided only representations of the context as features.
1 code implementation • 8 Apr 2020 • Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova
We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities.
2 code implementations • IJCNLP 2019 • Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge
Most existing work on adversarial data generation focuses on English.
no code implementations • ACL 2020 • Wei Wang, Ye Tian, Jiquan Ngiam, Yinfei Yang, Isaac Caswell, Zarana Parekh
Most data selection research in machine translation focuses on improving a single domain.
1 code implementation • WS 2019 • Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer
Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance.
no code implementations • ACL 2020 • Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures.
no code implementations • WS 2019 • Mandy Guo, Yinfei Yang, Keith Stevens, Daniel Cer, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We explore using multilingual document embeddings for nearest neighbor mining of parallel data.
no code implementations • NAACL 2019 • Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova
Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing.
no code implementations • 22 Feb 2019 • Yinfei Yang, Gustavo Hernandez Abrego, Steve Yuan, Mandy Guo, Qinlan Shen, Daniel Cer, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs.
no code implementations • EMNLP 2018 • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil
We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance.
no code implementations • WS 2019 • Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data.
no code implementations • 29 Aug 2018 • Cen Chen, Minghui Qiu, Yinfei Yang, Jun Zhou, Jun Huang, Xiaolong Li, Forrest Bao
Product reviews, in the form of texts dominantly, significantly help consumers finalize their purchasing decisions.
no code implementations • WS 2018 • Mandy Guo, Qinlan Shen, Yinfei Yang, Heming Ge, Daniel Cer, Gustavo Hernandez Abrego, Keith Stevens, Noah Constant, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.
3 code implementations • ACL 2018 • Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J. Marshall, Ani Nenkova, Byron C. Wallace
We present a corpus of 5, 000 richly annotated abstracts of medical articles describing clinical randomized controlled trials.
no code implementations • NAACL 2018 • Cen Chen, Yinfei Yang, Jun Zhou, Xiaolong Li, Forrest Sheng Bao
With the growing amount of reviews in e-commerce websites, it is critical to assess the helpfulness of reviews and recommend them accordingly to consumers.
no code implementations • NAACL 2018 • Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, Byron Wallace
Medical professionals search the published literature by specifying the type of patients, the medical intervention(s) and the outcome measure(s) of interest.
1 code implementation • WS 2018 • Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We present a novel approach to learn representations for sentence-level semantic similarity using conversational data.
22 code implementations • 29 Mar 2018 • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.
Ranked #1 on
Text Classification
on TREC-6
Conversational Response Selection
Semantic Textual Similarity
+6
no code implementations • 3 Apr 2017 • Yinfei Yang, Ani Nenkova
On manually annotated data, we compare the performance of domain-specific classifiers, trained on data only from a given news domain and a general classifier in which data from all four domains is pooled together.
no code implementations • EACL 2017 • Yinfei Yang, Cen Chen, Minghui Qiu, Forrest Bao
Recent work on aspect extraction is leveraging the hierarchical relationship between products and their categories.
no code implementations • EACL 2017 • Yinfei Yang, Forrest Sheng Bao, Ani Nenkova
We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of document-summary pairs.