Search Results for author: Jason Baldridge

Found 53 papers, 18 papers with code

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +2

Gaussian Process Probes (GPP) for Uncertainty-Aware Probing

1 code implementation29 May 2023 Zi Wang, Alexander Ku, Jason Baldridge, Thomas L. Griffiths, Been Kim

Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.

Gaussian Processes

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations CVPR 2023 Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting text-guided-image-editing

Underspecification in Scene Description-to-Depiction Tasks

no code implementations11 Oct 2022 Ben Hutchinson, Jason Baldridge, Vinodkumar Prabhakaran

Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+text systems, yet have received little attention to date.

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

no code implementations CVPR 2023 Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.

 Ranked #1 on Vision and Language Navigation on RxR (using extra training data)

Imitation Learning Instruction Following +1

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations22 Jun 2022 Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Machine Translation

Vector-quantized Image Modeling with Improved VQGAN

3 code implementations ICLR 2022 Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.

Image Generation Representation Learning +1

MURAL: Multimodal, Multitask Retrieval Across Languages

no code implementations10 Sep 2021 Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.

Image-text matching Retrieval +5

Pathdreamer: A World Model for Indoor Navigation

1 code implementation ICCV 2021 Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.

Semantic Segmentation Vision and Language Navigation

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

no code implementations5 Apr 2021 Ramon Sanabria, Austin Waters, Jason Baldridge

Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

PanGEA: The Panoramic Graph Environment Annotation Toolkit

no code implementations NAACL (ALVR) 2021 Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.

Instruction Following

On the Evaluation of Vision-and-Language Navigation Instructions

no code implementations EACL 2021 Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie

Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.

Vision and Language Navigation

Cross-Modal Contrastive Learning for Text-to-Image Generation

1 code implementation CVPR 2021 Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.

Ranked #26 on Text-to-Image Generation on COCO (using extra training data)

Contrastive Learning

Text-to-Image Generation Grounded by Fine-Grained User Attention

no code implementations7 Nov 2020 Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.

Retrieval Segmentation +1

Spatial Language Representation with Multi-Level Geocoding

1 code implementation21 Aug 2020 Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang

We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations.

Toponym Resolution

Text Classification with Few Examples using Controlled Generalization

no code implementations NAACL 2019 Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth

Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems.

General Classification text-classification +2

Mapping Natural Language Instructions to Mobile UI Action Sequences

2 code implementations ACL 2020 Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it.

Extending Machine Language Models toward Human-Level Language Understanding

no code implementations12 Dec 2019 James L. McClelland, Felix Hill, Maja Rudolph, Jason Baldridge, Hinrich Schütze

We take language to be a part of a system for understanding and communicating about situations.

Learning Dense Representations for Entity Retrieval

no code implementations CONLL 2019 Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano

We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search.

Entity Linking Entity Retrieval +1

Transferable Representation Learning in Vision-and-Language Navigation

no code implementations ICCV 2019 Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie

Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals.

Representation Learning Vision and Language Navigation

General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping

1 code implementation11 Jul 2019 Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge

We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents.

Dynamic Time Warping Navigate +2

Multi-modal Discriminative Model for Vision-and-Language Navigation

no code implementations WS 2019 Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie

Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.

Vision and Language Navigation

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

no code implementations ACL 2019 Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge

We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.

Instruction Following Vision and Language Navigation

PAWS: Paraphrase Adversaries from Word Scrambling

2 code implementations NAACL 2019 Yuan Zhang, Jason Baldridge, Luheng He

Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases.

Paraphrase Identification Translation

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

4 code implementations TACL 2018 Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge

Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge.

Natural Language Understanding

A Fast, Compact, Accurate Model for Language Identification of Codemixed Text

no code implementations EMNLP 2018 Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss

We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages.

Language Identification

Learning To Split and Rephrase From Wikipedia Edit History

1 code implementation EMNLP 2018 Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.

Split and Rephrase

Fill it up: Exploiting partial dependency annotations in a minimum spanning tree parser

no code implementations26 Nov 2016 Liang Sun, Jason Mielens, Jason Baldridge

Unsupervised models of dependency parsing typically require large amounts of clean, unlabeled data plus gold-standard part-of-speech tags.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.