Search Results for author: Agrim Gupta

Found 15 papers, 7 papers with code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Language Modelling Large Language Model +2

Paper
Add Code

Photorealistic Video Generation with Diffusion Models

no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.

Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64

Text-to-Video Generation Video Generation +1

Paper
Add Code

Holistic Evaluation of Text-To-Image Models

1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.

Fairness

1,635

Paper
Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

no code implementations • 20 Jun 2023 • Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Rothörl, José Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess

With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task.

Paper
Add Code

GreenMO: Virtualized User-proportionate MIMO

no code implementations • 29 Nov 2022 • Agrim Gupta, Sajjad Nassirpour, Manideep Dunna, Eamon Patamasing, Alireza Vahid, Dinesh Bharadia

The reason is that traditionally MIMO requires a separate RF chain per antenna, so the power consumption scales with number of antennas, instead of number of users, hence becomes energy inefficient.

Paper
Add Code

VIMA: General Robot Manipulation with Multimodal Prompts

2 code implementations • 6 Oct 2022 • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan

We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.

Imitation Learning Language Modelling +3

672

Paper
Code

MaskViT: Masked Visual Pre-Training for Video Prediction

no code implementations • 23 Jun 2022 • Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei

This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.

Scheduling Video Prediction

Paper
Add Code

MetaMorph: Learning Universal Controllers with Transformers

2 code implementations • ICLR 2022 • Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei

Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.

Zero-shot Generalization

Paper
Code

Embodied Intelligence via Learning and Evolution

1 code implementation • 3 Feb 2021 • Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.

148

Paper
Code

WiForce: Wireless Sensing and Localization of Contact Forces on a Space Continuum

no code implementations • 31 Dec 2020 • Agrim Gupta, Cedric Girerd, Manideep Dunna, Qiming Zhang, Raghav Subbaraman, Tania Morimoto, Dinesh Bharadia

Contact force is a natural way for humans to interact with the physical world around us.

TAG

Paper
Add Code

LVIS: A Dataset for Large Vocabulary Instance Segmentation

3 code implementations • CVPR 2019 • Agrim Gupta, Piotr Dollár, Ross Girshick

We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.

Instance Segmentation Object +4

394

Paper
Code

Image Generation from Scene Graphs

4 code implementations • CVPR 2018 • Justin Johnson, Agrim Gupta, Li Fei-Fei

To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships.

Ranked #4 on Layout-to-Image Generation on Visual Genome 64x64

Image Generation from Scene Graphs Layout-to-Image Generation

1,286

Paper
Code

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

7 code implementations • CVPR 2018 • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.

Ranked #4 on Trajectory Prediction on ETH

Collision Avoidance Motion Forecasting +4

792

Paper
Code

Characterizing and Improving Stability in Neural Style Transfer

no code implementations • ICCV 2017 • Agrim Gupta, Justin Johnson, Alexandre Alahi, Li Fei-Fei

Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods.

Optical Flow Estimation Style Transfer +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.