no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #7 on Text-to-Video Generation on UCF-101
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 30 Nov 2023 • Meera Hahn, Amit Raj, James M. Rehg
The challenging task of Vision-and-Language Navigation (VLN) requires embodied agents to follow natural language instructions to reach a goal location or object (e. g. `walk down the hallway and turn left at the piano').
1 code implementation • 24 Nov 2023 • Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar
We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.
no code implementations • 10 Oct 2022 • Meera Hahn, James M. Rehg
We address the challenging task of Localization via Embodied Dialog (LED).
no code implementations • 7 Oct 2022 • Meera Hahn, Kevin Carlberg, Ruta Desai, James Hillis
We introduce a novel interface for large scale collection of human memory and assistance.
1 code implementation • NeurIPS 2021 • Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta
Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards.
2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson
In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.
no code implementations • 22 Apr 2019 • Meera Hahn, Asim Kadav, James M. Rehg, Hans Peter Graf
Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video.
no code implementations • 2 Jan 2019 • Meera Hahn, Andrew Silva, James M. Rehg
We describe a novel cross-modal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatio-temporal features derived from video clips.
no code implementations • 22 Sep 2018 • Meera Hahn, Nataniel Ruiz, Jean-Baptiste Alayrac, Ivan Laptev, James M. Rehg
Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision.
no code implementations • 13 Dec 2015 • Meera Hahn, Si Chen, Afshin Dehghan
In this paper, we study a discriminatively trained deep convolutional network for the task of visual tracking.