Search Results for author: Yash Kant

Found 13 papers, 5 papers with code

SPAD : Spatially Aware Multiview Diffusers

no code implementations • 7 Feb 2024 • Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin

We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images.

3D Generation Novel View Synthesis +1

Paper
Add Code

AToM: Amortized Text-to-Mesh using 2D Diffusion

no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.

Text to 3D

Paper
Add Code

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations • 21 Dec 2023 • Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

Paper
Add Code

iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

no code implementations • 24 Oct 2023 • Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Our approach focuses on maximizing the reuse of visible pixels from the source image.

Novel View Synthesis

Paper
Add Code

CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

1 code implementation • 14 Apr 2023 • Tianshu Kuai, Akash Karthikeyan, Yash Kant, Ashkan Mirzaei, Igor Gilitschenski

Animating an object in 3D often requires an articulated structure, e. g. a kinematic chain or skeleton of the manipulated object with proper skinning weights, to obtain smooth movements and surface deformations.

Object Surface Reconstruction

Paper
Code

Invertible Neural Skinning

no code implementations • CVPR 2023 • Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline.

Paper
Add Code

Building Scalable Video Understanding Benchmarks through Sports

no code implementations • 17 Jan 2023 • Aniket Agarwal, Alex Zhang, Karthik Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant

Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed.

Video Understanding

Paper
Add Code

LaTeRF: Label and Text Driven Object Radiance Fields

no code implementations • 4 Jul 2022 • Ashkan Mirzaei, Yash Kant, Jonathan Kelly, Igor Gilitschenski

In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images.

Object

Paper
Add Code

Housekeep: Tidying Virtual Households using Commonsense Reasoning

1 code implementation • 22 May 2022 • Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal

Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house.

Language Modelling Large Language Model

Paper
Code

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

no code implementations • 7 Nov 2021 • Shasta Ihorn, Yue-Ting Siu, Aditya Bodi, Lothar Narins, Jose M. Castanon, Yash Kant, Abhishek Das, Ilmi Yoon, Pooyan Fazli

To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video.

Video Description

Paper
Add Code

Contrast and Classify: Training Robust VQA Models

1 code implementation • ICCV 2021 • Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.

Contrastive Learning Data Augmentation +4

Paper
Code

Spatially Aware Multimodal Transformers for TextVQA

1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

Further, each head in our multi-head self-attention layer focuses on a different subset of relations.

Optical Character Recognition (OCR) Visual Grounding +1

Paper
Code

ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)

1 code implementation • 28 Jan 2019 • Harshal Mittal, Kartikey Pandey, Yash Kant

The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.