no code implementations • 31 Oct 2024 • Junxuan Li, Chen Cao, Gabriel Schwartz, Rawal Khirodkar, Christian Richardt, Tomas Simon, Yaser Sheikh, Shunsuke Saito
Unlike existing approaches that estimate parametric reflectance parameters via inverse rendering, our approach directly models learnable radiance transfer that incorporates global light transport in an efficient manner for real-time rendering.
no code implementations • 27 Oct 2024 • Rawal Khirodkar, Jyun-Ting Song, Jinkun Cao, Zhengyi Luo, Kris Kitani
Understanding how humans interact with each other is key to building realistic multi-human virtual reality systems.
2 code implementations • 22 Aug 2024 • Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, Shunsuke Saito
We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.
Ranked #1 on Keypoint Detection on MS COCO (Validation AP metric, using extra training data)
no code implementations • 22 Apr 2024 • Mana Masuda, Jinhyung Park, Shun Iwase, Rawal Khirodkar, Kris Kitani
While recent advancements in animatable human rendering have achieved remarkable results, they require test-time optimization for each subject which can be a significant limitation for real-world applications.
no code implementations • CVPR 2024 • Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu
We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets.
no code implementations • 28 Jan 2024 • Yu-Jhe Li, Yan Xu, Rawal Khirodkar, Jinhyung Park, Kris Kitani
In order to evaluate our proposed pipeline, we collect three video sets of RGBD videos recorded from multiple sparse-view depth cameras and ground truth 3D poses are manually annotated.
2 code implementations • CVPR 2024 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.
no code implementations • 25 May 2023 • Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
no code implementations • ICCV 2023 • Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
no code implementations • 8 Oct 2022 • Rawal Khirodkar, Brandon Smith, Siddhartha Chandra, Amit Agrawal, Antonio Criminisi
Ensemble approaches for deep-learning-based semantic segmentation remain insufficiently explored despite the proliferation of competitive benchmarks and downstream applications.
Ranked #10 on Semantic Segmentation on PASCAL Context
7 code implementations • CVPR 2023 • Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani
Instead of relying only on the linear state estimate (i. e., estimation-centric approach), we use object observations (i. e., the measurements by object detector) to compute a virtual trajectory over the occlusion period to fix the error accumulation of filter parameters during the occlusion period.
Ranked #2 on Multiple Object Tracking on CroHD
no code implementations • CVPR 2022 • Rawal Khirodkar, Shashank Tripathi, Kris Kitani
Along with the input image, we condition the top-down model on spatial context from the image in the form of body-center heatmaps.
Ranked #74 on 3D Human Pose Estimation on 3DPW (using extra training data)
1 code implementation • ICCV 2021 • Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani
Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in.
Ranked #5 on 6D Pose Estimation using RGB on LineMOD
1 code implementation • ICCV 2021 • Rawal Khirodkar, Visesh Chari, Amit Agrawal, Ambrish Tyagi
Specifically, we achieve 70. 0 AP on CrowdPose and 42. 5 AP on OCHuman test sets, a significant improvement of 2. 4 AP and 6. 5 AP over the prior art, respectively.
Ranked #1 on Multi-Person Pose Estimation on OCHuman
no code implementations • 3 Dec 2018 • Rawal Khirodkar, Kris M. Kitani
Domain Randomization (DR) is known to require a significant amount of training data for good performance.
1 code implementation • 14 Nov 2018 • Rawal Khirodkar, Donghyun Yoo, Kris M. Kitani
We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator.