no code implementations • Findings (EMNLP) 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
no code implementations • ACL (splurobonlp) 2021 • Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic coordinates.
no code implementations • EMNLP (SpLU) 2020 • Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski
These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.
no code implementations • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
no code implementations • 25 Nov 2021 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.
1 code implementation • ICLR 2022 • Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.
no code implementations • 10 Sep 2021 • Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages.
Ranked #1 on
Semantic Image Similarity
on CxC
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
no code implementations • 5 Apr 2021 • Ramon Sanabria, Austin Waters, Jason Baldridge
Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.
no code implementations • NAACL (ALVR) 2021 • Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge
PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.
no code implementations • EACL 2021 • Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
1 code implementation • CVPR 2021 • Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.
no code implementations • 7 Nov 2020 • Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.
3 code implementations • EMNLP 2020 • Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
no code implementations • 21 Aug 2020 • Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations.
no code implementations • NAACL 2019 • Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth
Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems.
2 code implementations • ACL 2020 • Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge
We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it.
1 code implementation • EACL 2021 • Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, Yinfei Yang
By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning.
4 code implementations • 10 Jan 2020 • Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski
These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.
Ranked #7 on
Vision and Language Navigation
on Touchdown Dataset
no code implementations • 12 Dec 2019 • James L. McClelland, Felix Hill, Maja Rudolph, Jason Baldridge, Hinrich Schütze
We take language to be a part of a system for understanding and communicating about situations.
no code implementations • CONLL 2019 • Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano
We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search.
no code implementations • CONLL 2019 • Gabriel Ilharco, Yuan Zhang, Jason Baldridge
Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning.
2 code implementations • IJCNLP 2019 • Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge
Most existing work on adversarial data generation focuses on English.
no code implementations • ICCV 2019 • Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals.
Ranked #90 on
Vision and Language Navigation
on VLN Challenge
1 code implementation • 11 Jul 2019 • Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge
We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents.
no code implementations • WS 2019 • Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.
no code implementations • ACL 2019 • Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge
We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.
2 code implementations • NAACL 2019 • Yuan Zhang, Jason Baldridge, Luheng He
Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases.
no code implementations • 31 Oct 2018 • Su Wang, Rahul Gupta, Nancy Chang, Jason Baldridge
Paraphrasing is rooted in semantics.
4 code implementations • TACL 2018 • Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge
Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge.
no code implementations • EMNLP 2018 • Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss
We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages.
1 code implementation • EMNLP 2018 • Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das
Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.
no code implementations • 26 Nov 2016 • Liang Sun, Jason Mielens, Jason Baldridge
Unsupervised models of dependency parsing typically require large amounts of clean, unlabeled data plus gold-standard part-of-speech tags.
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.