no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #3 on Text-to-Video Generation on MSR-VTT
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64
1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang
The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.
no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 20 Jun 2023 • Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Rothörl, José Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess
With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task.
no code implementations • 29 Nov 2022 • Agrim Gupta, Sajjad Nassirpour, Manideep Dunna, Eamon Patamasing, Alireza Vahid, Dinesh Bharadia
The reason is that traditionally MIMO requires a separate RF chain per antenna, so the power consumption scales with number of antennas, instead of number of users, hence becomes energy inefficient.
2 code implementations • 6 Oct 2022 • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.
no code implementations • 23 Jun 2022 • Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei
This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.
2 code implementations • ICLR 2022 • Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei
Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.
1 code implementation • 3 Feb 2021 • Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei
However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.
no code implementations • 31 Dec 2020 • Agrim Gupta, Cedric Girerd, Manideep Dunna, Qiming Zhang, Raghav Subbaraman, Tania Morimoto, Dinesh Bharadia
Contact force is a natural way for humans to interact with the physical world around us.
3 code implementations • CVPR 2019 • Agrim Gupta, Piotr Dollár, Ross Girshick
We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.
4 code implementations • CVPR 2018 • Justin Johnson, Agrim Gupta, Li Fei-Fei
To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships.
Ranked #4 on Layout-to-Image Generation on Visual Genome 64x64
Image Generation from Scene Graphs Layout-to-Image Generation
7 code implementations • CVPR 2018 • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi
Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.
Ranked #4 on Trajectory Prediction on ETH
no code implementations • ICCV 2017 • Agrim Gupta, Justin Johnson, Alexandre Alahi, Li Fei-Fei
Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods.