no code implementations • 30 Apr 2024 • Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge
Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research.
no code implementations • 27 Dec 2023 • Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson
Text-to-image generation models are powerful but difficult to use.
no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.
Ranked #1 on Vision and Language Navigation on RxR (using extra training data)
2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on Text-to-Image Generation on LAION COCO
5 code implementations • ICLR 2022 • Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.
no code implementations • NAACL (ALVR) 2021 • Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge
PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.
no code implementations • EACL 2021 • Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
3 code implementations • EMNLP 2020 • Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
Ranked #5 on Vision and Language Navigation on RxR
no code implementations • ICCV 2019 • Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals.
Ranked #115 on Vision and Language Navigation on VLN Challenge
1 code implementation • 11 Jul 2019 • Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge
We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents.
no code implementations • ACL 2019 • Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge
We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.
no code implementations • 15 Feb 2018 • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
Image generation has been successfully cast as an autoregressive sequence generation or transformation problem.
Ranked #3 on Density Estimation on CIFAR-10
no code implementations • ICLR 2018 • Francois W. Belletti, Alexander Ku, Joseph E. Gonzalez
Designing neural networks for continuous-time stochastic processes is challenging, especially when observations are made irregularly.
no code implementations • ICLR 2018 • Joshua Peterson, Krishan Aghi, Jordan Suchow, Alexander Ku, Tom Griffiths
In this paper, we introduce a method for estimating the structure of human categories that draws on ideas from both cognitive science and machine learning, blending human-based algorithms with state-of-the-art deep representation learners.