no code implementations • 7 Feb 2024 • Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin
We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images.
no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov
We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.
no code implementations • 21 Dec 2023 • Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee
Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.
no code implementations • 24 Oct 2023 • Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski
Our approach focuses on maximizing the reuse of visible pixels from the source image.
1 code implementation • 14 Apr 2023 • Tianshu Kuai, Akash Karthikeyan, Yash Kant, Ashkan Mirzaei, Igor Gilitschenski
Animating an object in 3D often requires an articulated structure, e. g. a kinematic chain or skeleton of the manipulated object with proper skinning weights, to obtain smooth movements and surface deformations.
no code implementations • CVPR 2023 • Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski
Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline.
no code implementations • 17 Jan 2023 • Aniket Agarwal, Alex Zhang, Karthik Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant
Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed.
no code implementations • 4 Jul 2022 • Ashkan Mirzaei, Yash Kant, Jonathan Kelly, Igor Gilitschenski
In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images.
1 code implementation • 22 May 2022 • Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal
Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house.
no code implementations • 7 Nov 2021 • Shasta Ihorn, Yue-Ting Siu, Aditya Bodi, Lothar Narins, Jose M. Castanon, Yash Kant, Abhishek Das, Ilmi Yoon, Pooyan Fazli
To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video.
1 code implementation • ICCV 2021 • Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.
1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Further, each head in our multi-head self-attention layer focuses on a different subset of relations.
1 code implementation • 28 Jan 2019 • Harshal Mittal, Kartikey Pandey, Yash Kant
The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum.