no code implementations • EMNLP (SpLU) 2020 • Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski
These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.
no code implementations • ACL (splurobonlp) 2021 • Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic coordinates.
no code implementations • Findings (EMNLP) 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.
1 code implementation • 29 May 2023 • Zi Wang, Alexander Ku, Jason Baldridge, Thomas L. Griffiths, Been Kim
Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
no code implementations • 23 Mar 2023 • Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu
The field of vision and language has witnessed a proliferation of pre-trained foundation models.
no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
no code implementations • 11 Oct 2022 • Ben Hutchinson, Jason Baldridge, Vinodkumar Prabhakaran
Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+text systems, yet have received little attention to date.
no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.
Ranked #1 on
Vision and Language Navigation
on RxR
(using extra training data)
2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on
Text-to-Image Generation
on LAION COCO
1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.
3 code implementations • ICLR 2022 • Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.
no code implementations • 10 Sep 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
Ranked #1 on
Semantic Image-Text Similarity
on CxC
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
no code implementations • 5 Apr 2021 • Ramon Sanabria, Austin Waters, Jason Baldridge
Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • NAACL (ALVR) 2021 • Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge
PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.
no code implementations • EACL 2021 • Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
1 code implementation • CVPR 2021 • Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.
Ranked #26 on
Text-to-Image Generation
on COCO
(using extra training data)
no code implementations • 7 Nov 2020 • Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.
3 code implementations • EMNLP 2020 • Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
Ranked #5 on
Vision and Language Navigation
on RxR
1 code implementation • 21 Aug 2020 • Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations.
no code implementations • NAACL 2019 • Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth
Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems.
2 code implementations • ACL 2020 • Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge
We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it.
2 code implementations • EACL 2021 • Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, Yinfei Yang
By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning.
4 code implementations • 10 Jan 2020 • Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski
These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.
Ranked #7 on
Vision and Language Navigation
on Touchdown Dataset
no code implementations • 12 Dec 2019 • James L. McClelland, Felix Hill, Maja Rudolph, Jason Baldridge, Hinrich Schütze
We take language to be a part of a system for understanding and communicating about situations.
no code implementations • CONLL 2019 • Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano
We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search.
no code implementations • CONLL 2019 • Gabriel Ilharco, Yuan Zhang, Jason Baldridge
Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning.
3 code implementations • IJCNLP 2019 • Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge
Most existing work on adversarial data generation focuses on English.
no code implementations • ICCV 2019 • Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals.
Ranked #115 on
Vision and Language Navigation
on VLN Challenge
1 code implementation • 11 Jul 2019 • Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge
We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents.
no code implementations • WS 2019 • Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.
no code implementations • ACL 2019 • Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge
We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.
2 code implementations • NAACL 2019 • Yuan Zhang, Jason Baldridge, Luheng He
Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases.
no code implementations • 31 Oct 2018 • Su Wang, Rahul Gupta, Nancy Chang, Jason Baldridge
Paraphrasing is rooted in semantics.
4 code implementations • TACL 2018 • Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge
Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge.
no code implementations • EMNLP 2018 • Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss
We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages.
1 code implementation • EMNLP 2018 • Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das
Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.
no code implementations • 26 Nov 2016 • Liang Sun, Jason Mielens, Jason Baldridge
Unsupervised models of dependency parsing typically require large amounts of clean, unlabeled data plus gold-standard part-of-speech tags.
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.