Search Results for author: Yinfei Yang

Found 67 papers, 30 papers with code

MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models

1 code implementation EACL (AdaptNLP) 2021 Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant

Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This dataset paper presents MultiReQA, a new multi-domain ReQA evaluation suite composed of eight retrieval QA tasks drawn from publicly available QA datasets.

Information Retrieval Question Answering +3

Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval

no code implementations EMNLP 2021 Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang

In the context of neural passage retrieval, we study three promising techniques: synthetic data generation, negative sampling, and fusion.

Passage Retrieval Retrieval +1

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

no code implementations1 Jul 2024 Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.

Instruction Following

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

no code implementations11 Apr 2024 Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.

Language Modelling Large Language Model +1

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

1 code implementation20 Feb 2024 Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan

The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions.

Ferret: Refer and Ground Anything Anywhere at Any Granularity

2 code implementations11 Oct 2023 Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Hallucination Language Modelling +2

Compressing LLMs: The Truth is Rarely Pure and Never Simple

1 code implementation2 Oct 2023 Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang

Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.

Quantization Retrieval

Guiding Instruction-based Image Editing via Multimodal Large Language Models

2 code implementations29 Sep 2023 Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.

Image Manipulation Response Generation

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

no code implementations8 Sep 2023 Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.

MOFI: Learning Image Representations from Noisy Entity Annotated Images

1 code implementation13 Jun 2023 Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.

Image Classification Image Retrieval +3

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

1 code implementation8 May 2023 Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

On Robustness in Multimodal Learning

no code implementations10 Apr 2023 Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.

Representation Learning

Masked Autoencoding Does Not Help Natural Language Supervision at Scale

no code implementations CVPR 2023 Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks.

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

no code implementations CVPR 2023 Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.

 Ranked #1 on Vision and Language Navigation on RxR (using extra training data)

Imitation Learning Instruction Following +1

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations22 Jun 2022 Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Decoder Machine Translation +2

Large Dual Encoders Are Generalizable Retrievers

2 code implementations15 Dec 2021 Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang

With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization.

Domain Generalization Retrieval +1

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

1 code implementation EMNLP 2021 ZiYi Yang, Yinfei Yang, Daniel Cer, Eric Darve

A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data.

Cross-Lingual Transfer Retrieval

MURAL: Multimodal, Multitask Retrieval Across Languages

no code implementations10 Sep 2021 Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.

Image-text matching Retrieval +5

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

2 code implementations Findings (ACL) 2022 Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang

To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.

Contrastive Learning Decoder +4

Pathdreamer: A World Model for Indoor Navigation

1 code implementation ICCV 2021 Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.

Semantic Segmentation Vision and Language Navigation

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

4 code implementations11 Feb 2021 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

 Ranked #1 on Image Classification on VTAB-1k (using extra training data)

Cross-Modal Retrieval Fine-Grained Image Classification +6

Universal Sentence Representations Learning with Conditional Masked Language Model

no code implementations1 Jan 2021 ZiYi Yang, Yinfei Yang, Daniel M Cer, Jax Law, Eric Darve

This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.

Language Modelling Masked Language Modeling +4

Universal Sentence Representation Learning with Conditional Masked Language Model

no code implementations EMNLP 2021 ZiYi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve

This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora.

Language Modelling Masked Language Modeling +4

Text-to-Image Generation Grounded by Fine-Grained User Attention

no code implementations7 Nov 2020 Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.

Position Retrieval +3

Neural Passage Retrieval with Improved Negative Contrast

no code implementations23 Oct 2020 Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang

In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering.

Open-Domain Question Answering Passage Retrieval +3

Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation

no code implementations ACL 2021 Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer

Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers.

Data Augmentation Question Answering +1

Language-agnostic BERT Sentence Embedding

6 code implementations ACL 2022 Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.

Language Modelling Masked Language Modeling +10

SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling

1 code implementation NAACL 2022 Forrest Sheng Bao, Hebi Li, Ge Luo, Minghui Qiu, Yinfei Yang, Youbiao He, Cen Chen

Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain.

Abstractive Text Summarization Document Embedding +3

MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models

1 code implementation5 May 2020 Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen, Noah Constant

Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al., 2019). This paper presents MultiReQA, anew multi-domain ReQA evaluation suite com-posed of eight retrieval QA tasks drawn from publicly available QA datasets.

Information Retrieval Question Answering +2

Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve

no code implementations CL (ACL) 2021 Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova

We examine these questions by contrasting the performance of several variants of LSTM-CRF architectures for named entity recognition, with some provided only representations of the context as features.

named-entity-recognition Named Entity Recognition +1

Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models

1 code implementation8 Apr 2020 Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova

We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities.

Fairness named-entity-recognition +2

ReQA: An Evaluation for End-to-End Answer Retrieval Models

1 code implementation WS 2019 Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer

Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance.

Information Retrieval Question Answering +2

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

no code implementations NAACL 2019 Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova

Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing.

Review Helpfulness Prediction with Embedding-Gated CNN

no code implementations29 Aug 2018 Cen Chen, Minghui Qiu, Yinfei Yang, Jun Zhou, Jun Huang, Xiaolong Li, Forrest Bao

Product reviews, in the form of texts dominantly, significantly help consumers finalize their purchasing decisions.

Sentence

Syntactic Patterns Improve Information Extraction for Medical Search

no code implementations NAACL 2018 Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, Byron Wallace

Medical professionals search the published literature by specifying the type of patients, the medical intervention(s) and the outcome measure(s) of interest.

Universal Sentence Encoder

24 code implementations29 Mar 2018 Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.

Conversational Response Selection Semantic Textual Similarity +7

Combining Lexical and Syntactic Features for Detecting Content-dense Texts in News

no code implementations3 Apr 2017 Yinfei Yang, Ani Nenkova

On manually annotated data, we compare the performance of domain-specific classifiers, trained on data only from a given news domain and a general classifier in which data from all four domains is pooled together.

Question Answering

Detecting (Un)Important Content for Single-Document News Summarization

no code implementations EACL 2017 Yinfei Yang, Forrest Sheng Bao, Ani Nenkova

We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of document-summary pairs.

Document Summarization News Summarization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.