Search Results for author: Yash Kant

Found 20 papers, 6 papers with code

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

no code implementations3 Mar 2025 Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, Chen Cao

Once the UPM is learned to accurately reproduce the large-scale multi-view human images, we fine-tune the model with an in-the-wild video via inverse rendering to obtain a personalized photorealistic human avatar that can be faithfully animated to novel human motions and rendered from novel views.

Inverse Rendering

Pippo: High-Resolution Multi-View Humans from a Single Image

no code implementations11 Feb 2025 Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov

Finally, we also introduce an improved metric to evaluate 3D consistency of multi-view generations, and show that Pippo outperforms existing works on multi-view human generation from a single image.

Fillerbuster: Multi-View Scene Completion for Casual Captures

no code implementations7 Feb 2025 Ethan Weber, Norman Müller, Yash Kant, Vasu Agrawal, Michael Zollhöfer, Angjoo Kanazawa, Christian Richardt

Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when desired.

Realistic Evaluation of Model Merging for Compositional Generalization

1 code implementation26 Sep 2024 Derek Tam, Yash Kant, Brian Lester, Igor Gilitschenski, Colin Raffel

Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance.

Image Classification Image Generation

Pixel-Aligned Multi-View Generation with Depth Guided Decoder

no code implementations26 Aug 2024 Zhenggang Tang, Peiye Zhuang, Chaoyang Wang, Aliaksandr Siarohin, Yash Kant, Alexander Schwing, Sergey Tulyakov, Hsin-Ying Lee

During inference, we employ a rapid multi-view to 3D reconstruction approach, NeuS, to obtain coarse depth for the depth-truncated epipolar attention.

3D Reconstruction Decoder +1

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations21 Dec 2023 Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

NeRF

CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

1 code implementation14 Apr 2023 Tianshu Kuai, Akash Karthikeyan, Yash Kant, Ashkan Mirzaei, Igor Gilitschenski

Animating an object in 3D often requires an articulated structure, e. g. a kinematic chain or skeleton of the manipulated object with proper skinning weights, to obtain smooth movements and surface deformations.

Object Surface Reconstruction

Invertible Neural Skinning

no code implementations CVPR 2023 Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline.

Building Scalable Video Understanding Benchmarks through Sports

no code implementations17 Jan 2023 Aniket Agarwal, Alex Zhang, Karthik Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant

Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed.

Video Understanding

LaTeRF: Label and Text Driven Object Radiance Fields

no code implementations4 Jul 2022 Ashkan Mirzaei, Yash Kant, Jonathan Kelly, Igor Gilitschenski

In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images.

NeRF Object

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

no code implementations7 Nov 2021 Shasta Ihorn, Yue-Ting Siu, Aditya Bodi, Lothar Narins, Jose M. Castanon, Yash Kant, Abhishek Das, Ilmi Yoon, Pooyan Fazli

To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video.

Video Description

Contrast and Classify: Training Robust VQA Models

1 code implementation ICCV 2021 Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.

Contrastive Learning Data Augmentation +4

ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)

1 code implementation28 Jan 2019 Harshal Mittal, Kartikey Pandey, Yash Kant

The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum.

Cannot find the paper you are looking for? You can Submit a new open access paper.