no code implementations • 27 Feb 2024 • Raghav Kapoor, Yash Parag Butala, Melisa Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, Ruslan Salakhutdinov
For decades, human-computer interaction has fundamentally been manual.
1 code implementation • 24 Jan 2024 • Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried
Through extensive quantitative and qualitative analysis, we identify several limitations of text-only LLM agents, and reveal gaps in the capabilities of state-of-the-art multimodal language agents.
1 code implementation • 11 Oct 2023 • Minji Yoon, Jing Yu Koh, Bryan Hooi, Ruslan Salakhutdinov
We study three research questions raised by MMGL: (1) how can we infuse multiple neighbor information into the pretrained LMs, while avoiding scalability issues?
1 code implementation • NeurIPS 2023 • Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov
This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs.
no code implementations • ICCV 2023 • Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun
Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.
1 code implementation • 31 Jan 2023 • Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried
We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images.
no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.
Ranked #1 on Vision and Language Navigation on RxR (using extra training data)
2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on Text-to-Image Generation on LAION COCO
1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
5 code implementations • ICLR 2022 • Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
1 code implementation • ICLR 2021 • Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong
Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to destruction in structure and content.
1 code implementation • CVPR 2021 • Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets.
Ranked #27 on Text-to-Image Generation on MS COCO (using extra training data)
no code implementations • 7 Nov 2020 • Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases.
no code implementations • ECCV 2020 • Jing Yu Koh, Duc Thanh Nguyen, Quang-Trung Truong, Sai-Kit Yeung, Alexander Binder
Fully-automatic execution is the ultimate goal for many Computer Vision applications.
no code implementations • ECCV 2018 • Tian Feng, Quang-Trung Truong, Duc Thanh Nguyen, Jing Yu Koh, Lap-Fai Yu, Alexander Binder, Sai-Kit Yeung
Urban zoning enables various applications in land use analysis and urban planning.
no code implementations • 29 Jun 2016 • Jing Yu Koh, Wojciech Samek, Klaus-Robert Müller, Alexander Binder
We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot manner by relying on conventional whole image neural net classifiers that were trained using large bounding boxes.