no code implementations • Findings (EMNLP) 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
1 code implementation • EACL (AdaptNLP) 2021 • Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant
Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This dataset paper presents MultiReQA, a new multi-domain ReQA evaluation suite composed of eight retrieval QA tasks drawn from publicly available QA datasets.
no code implementations • EMNLP 2021 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In the context of neural passage retrieval, we study three promising techniques: synthetic data generation, negative sampling, and fusion.
no code implementations • 2 Jul 2024 • Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, ZiRui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch
A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information.
no code implementations • 1 Jul 2024 • Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan
We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.
no code implementations • 11 Apr 2024 • Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang
While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.
Ranked #83 on Visual Question Answering on MM-Vet
no code implementations • 8 Apr 2024 • Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan
For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks.
no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang
Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.
Ranked #30 on Visual Question Answering on MM-Vet
1 code implementation • 20 Feb 2024 • Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan
The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions.
1 code implementation • 11 Oct 2023 • Zhengfeng Lai, Haotian Zhang, BoWen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
For example, VeCLIP achieves up to +25. 2% gain in COCO and Flickr30k retrieval tasks under the 12M setting.
2 code implementations • 11 Oct 2023 • Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.
1 code implementation • 2 Oct 2023 • Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang
Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.
2 code implementations • 29 Sep 2023 • Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan
Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.
no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du
We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.
1 code implementation • 13 Jun 2023 • Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.
1 code implementation • 8 May 2023 • Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng
In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.
no code implementations • 10 Apr 2023 • Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev
Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.
no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
no code implementations • CVPR 2023 • Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter
Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks.
1 code implementation • 20 Dec 2022 • Forrest Sheng Bao, Ruixuan Tu, Ge Luo, Yinfei Yang, Hebi Li, Minghui Qiu, Youbiao He, Cen Chen
Automated summary quality assessment falls into two categories: reference-based and reference-free.
2 code implementations • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens
In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.
Ranked #1 on Vision and Language Navigation on RxR (using extra training data)
2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on Text-to-Image Generation on LAION COCO
1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
2 code implementations • Findings (NAACL) 2022 • Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models.
Ranked #1 on Text Summarization on BigPatent
2 code implementations • 15 Dec 2021 • Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang
With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization.
Ranked #9 on Zero-shot Text Search on BEIR
1 code implementation • EMNLP 2021 • ZiYi Yang, Yinfei Yang, Daniel Cer, Eric Darve
A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data.
no code implementations • 10 Sep 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
Ranked #1 on Semantic Image-Text Similarity on CxC
2 code implementations • Findings (ACL) 2022 • Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang
To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
4 code implementations • 11 Feb 2021 • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig
In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.
Ranked #1 on Image Classification on VTAB-1k (using extra training data)
1 code implementation • CVPR 2021 • Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.
Ranked #27 on Text-to-Image Generation on MS COCO (using extra training data)
no code implementations • 1 Jan 2021 • ZiYi Yang, Yinfei Yang, Daniel M Cer, Jax Law, Eric Darve
This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.
no code implementations • EMNLP 2021 • ZiYi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve
This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Gustavo Hernandez Abrego, Bowen Liang, Wei Wang, Zarana Parekh, Yinfei Yang, YunHsuan Sung
We evaluate our methods on de-noising parallel texts and training neural machine translation models.
no code implementations • 7 Nov 2020 • Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.
no code implementations • 23 Oct 2020 • Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering.
no code implementations • ACL 2021 • Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer
Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers.
6 code implementations • ACL 2022 • Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
1 code implementation • NAACL 2022 • Forrest Sheng Bao, Hebi Li, Ge Luo, Minghui Qiu, Yinfei Yang, Youbiao He, Cen Chen
Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain.
1 code implementation • 5 May 2020 • Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant
Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This paper presents MultiReQA, anew multi-domain ReQA evaluation suite com-posed of eight retrieval QA tasks drawn from publicly available QA datasets.
2 code implementations • EACL 2021 • Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, Yinfei Yang
By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning.
no code implementations • EACL 2021 • Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, Ryan Mcdonald
The question generation system is trained on general domain data, but is applied to documents in the targeted domain.
1 code implementation • EMNLP 2020 • Uma Roy, Noah Constant, Rami Al-Rfou, Aditya Barua, Aaron Phillips, Yinfei Yang
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool.
no code implementations • CL (ACL) 2021 • Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova
We examine these questions by contrasting the performance of several variants of LSTM-CRF architectures for named entity recognition, with some provided only representations of the context as features.
1 code implementation • 8 Apr 2020 • Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova
We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities.
3 code implementations • IJCNLP 2019 • Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge
Most existing work on adversarial data generation focuses on English.
no code implementations • ACL 2020 • Wei Wang, Ye Tian, Jiquan Ngiam, Yinfei Yang, Isaac Caswell, Zarana Parekh
Most data selection research in machine translation focuses on improving a single domain.
1 code implementation • WS 2019 • Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer
Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance.
no code implementations • ACL 2020 • Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures.
no code implementations • WS 2019 • Mandy Guo, Yinfei Yang, Keith Stevens, Daniel Cer, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We explore using multilingual document embeddings for nearest neighbor mining of parallel data.
no code implementations • NAACL 2019 • Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova
Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing.
no code implementations • 22 Feb 2019 • Yinfei Yang, Gustavo Hernandez Abrego, Steve Yuan, Mandy Guo, Qinlan Shen, Daniel Cer, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs.
no code implementations • EMNLP 2018 • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil
We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance.
no code implementations • WS 2019 • Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data.
no code implementations • 29 Aug 2018 • Cen Chen, Minghui Qiu, Yinfei Yang, Jun Zhou, Jun Huang, Xiaolong Li, Forrest Bao
Product reviews, in the form of texts dominantly, significantly help consumers finalize their purchasing decisions.
no code implementations • WS 2018 • Mandy Guo, Qinlan Shen, Yinfei Yang, Heming Ge, Daniel Cer, Gustavo Hernandez Abrego, Keith Stevens, Noah Constant, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.
2 code implementations • ACL 2018 • Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J. Marshall, Ani Nenkova, Byron C. Wallace
We present a corpus of 5, 000 richly annotated abstracts of medical articles describing clinical randomized controlled trials.
no code implementations • NAACL 2018 • Cen Chen, Yinfei Yang, Jun Zhou, Xiaolong Li, Forrest Sheng Bao
With the growing amount of reviews in e-commerce websites, it is critical to assess the helpfulness of reviews and recommend them accordingly to consumers.
no code implementations • NAACL 2018 • Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, Byron Wallace
Medical professionals search the published literature by specifying the type of patients, the medical intervention(s) and the outcome measure(s) of interest.
1 code implementation • WS 2018 • Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
We present a novel approach to learn representations for sentence-level semantic similarity using conversational data.
24 code implementations • 29 Mar 2018 • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.
Ranked #2 on Text Classification on TREC-6
Conversational Response Selection Semantic Textual Similarity +7
no code implementations • 3 Apr 2017 • Yinfei Yang, Ani Nenkova
On manually annotated data, we compare the performance of domain-specific classifiers, trained on data only from a given news domain and a general classifier in which data from all four domains is pooled together.
no code implementations • EACL 2017 • Yinfei Yang, Cen Chen, Minghui Qiu, Forrest Bao
Recent work on aspect extraction is leveraging the hierarchical relationship between products and their categories.
no code implementations • EACL 2017 • Yinfei Yang, Forrest Sheng Bao, Ani Nenkova
We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of document-summary pairs.