no code implementations • 27 Oct 2024 • Rawal Khirodkar, Jyun-Ting Song, Jinkun Cao, Zhengyi Luo, Kris Kitani
Understanding how humans interact with each other is key to building realistic multi-human virtual reality systems.
no code implementations • 17 Oct 2024 • Bolin Lai, Sam Toyer, Tushar Nagarajan, Rohit Girdhar, Shengxin Zha, James M. Rehg, Kris Kitani, Kristen Grauman, Ruta Desai, Miao Liu
Predicting future human behavior is an increasingly popular topic in computer vision, driven by the interest in applications such as autonomous vehicles, digital assistants and human-robot interactions.
no code implementations • 30 Sep 2024 • Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Fu-Jen Chu, Kris Kitani, Gedas Bertasius, Xitong Yang
Goal-oriented planning, or anticipating a series of actions that transition an agent from its current state to a predefined objective, is crucial for developing intelligent assistants aiding users in daily procedural tasks.
no code implementations • 6 Sep 2024 • Jinkun Cao, Jingyuan Liu, Kris Kitani, Yi Zhou
Compared to previous works of generating hand poses with a given object, we aim to allow the generalization of both hand and object shapes by a single model.
no code implementations • 7 Aug 2024 • Zi-Yi Dou, Xitong Yang, Tushar Nagarajan, Huiyu Wang, Jing Huang, Nanyun Peng, Kris Kitani, Fu-Jen Chu
We present EMBED (Egocentric Models Built with Exocentric Data), a method designed to transform exocentric video-language data for egocentric video representation learning.
no code implementations • 1 Aug 2024 • Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman
Our method takes a video demonstration and its accompanying 3D body pose and generates (1) free-form expert commentary describing what the person is doing well and what they could improve, and (2) a visual expert demonstration that incorporates the required corrections.
no code implementations • 11 Jul 2024 • Benjamin A. Newman, Pranay Gupta, Kris Kitani, Yonatan Bisk, Henny Admoni, Chris Paxton
De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences.
no code implementations • 28 Jun 2024 • Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani
We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports.
no code implementations • 13 Jun 2024 • Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi
We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy.
no code implementations • 22 Apr 2024 • Mana Masuda, Jinhyung Park, Shun Iwase, Rawal Khirodkar, Kris Kitani
While recent advancements in animatable human rendering have achieved remarkable results, they require test-time optimization for each subject which can be a significant limitation for real-world applications.
no code implementations • CVPR 2024 • Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani
We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category.
1 code implementation • 16 Apr 2024 • Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni
Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets.
no code implementations • 21 Mar 2024 • Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
no code implementations • CVPR 2024 • Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu
We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets.
no code implementations • 7 Mar 2024 • Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi
We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera.
no code implementations • 24 Feb 2024 • Jinkun Cao, Jiangmiao Pang, Kris Kitani
We propose a new visual hierarchical representation paradigm for multi-object tracking.
no code implementations • 19 Feb 2024 • Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang
However, mapping from a standard Gaussian by a flow-based model hurts the capacity to capture complicated patterns of trajectories, ignoring the under-represented motion intentions in the training data.
no code implementations • 28 Jan 2024 • Yu-Jhe Li, Yan Xu, Rawal Khirodkar, Jinhyung Park, Kris Kitani
In order to evaluate our proposed pipeline, we collect three video sets of RGBD videos recorded from multiple sparse-view depth cameras and ground truth 3D poses are manually annotated.
no code implementations • CVPR 2024 • Jinhyung Park, Yu-Jhe Li, Kris Kitani
While recent depth completion methods have achieved remarkable results filling in relatively dense depth maps (e. g. projected 64-line LiDAR on KITTI or 500 sampled points on NYUv2) with RGB guidance their performance on very sparse input (e. g. 4-line LiDAR or 32 depth point measurements) is unverified.
no code implementations • 4 Dec 2023 • Yan Xu, Kris Kitani
The 2D human poses used in clustering are obtained through a pre-trained 2D pose detector, so our method does not require expensive 3D training data for each new scene.
2 code implementations • CVPR 2024 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.
no code implementations • 6 Oct 2023 • Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu
We close this gap by significantly increasing the coverage of our motion representation space.
no code implementations • ICCV 2023 • Rohan Choudhury, Kris Kitani, Laszlo A. Jeni
In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency.
no code implementations • 25 May 2023 • Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
no code implementations • NeurIPS 2023 • Pha Nguyen, Kha Gia Quach, Kris Kitani, Khoa Luu
This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions.
Grounded Multiple Object Tracking Multiple Object Tracking +1
1 code implementation • ICCV 2023 • Erica Weng, Hana Hoshino, Deva Ramanan, Kris Kitani
In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
no code implementations • ICCV 2023 • Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Kitani, Weipeng Xu
We present a physics-based humanoid controller that achieves high-fidelity motion imitation and fault-tolerant behavior in the presence of noisy input (e. g. pose estimates from video or generated from language) and unexpected falls.
no code implementations • CVPR 2023 • Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals.
no code implementations • 21 Mar 2023 • Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani
We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.
3 code implementations • 23 Feb 2023 • Gerard Maggiolino, Adnan Ahmad, Jinkun Cao, Kris Kitani
Motion-based association for Multi-Object Tracking (MOT) has recently re-achieved prominence with the rise of powerful object detectors.
Ranked #5 on Multi-Object Tracking on MOT20 (using extra training data)
no code implementations • ICCV 2023 • Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
1 code implementation • CVPR 2023 • Yu-Jhe Li, Shawn Hunt, Jinhyung Park, Matthew O’Toole, Kris Kitani
We also propose a hybrid super-resolution model (Hybrid-SR) combining our ADC-SR with a standard RAD super-resolution model, and show that performance can be improved by a large margin.
no code implementations • 14 Nov 2022 • Dennis Melamed, Karnik Ram, Vivek Roy, Kris Kitani
To address the robustness problem in map utilization, we propose a data-driven prior on possible user locations in a map by combining learned spatial map embeddings and temporal odometry embeddings.
no code implementations • 12 Nov 2022 • Yu-Jhe Li, Tao Xu, Bichen Wu, Ningyuan Zheng, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani
In the first stage, we introduce a base encoder that converts the input image to a latent code.
no code implementations • 17 Oct 2022 • Jinkun Cao, Hao Wu, Kris Kitani
Experiments on video multi-object tracking (MOT) and multi-object tracking and segmentation (MOTS) datasets demonstrate the effectiveness of the proposed DST position encoding.
Multi-Object Tracking Multi-Object Tracking and Segmentation +2
1 code implementation • 5 Oct 2022 • Jinhyung Park, Chenfeng Xu, Shijia Yang, Kurt Keutzer, Kris Kitani, Masayoshi Tomizuka, Wei Zhan
While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception.
Ranked #1 on Robust Camera Only 3D Object Detection on nuScenes-C
no code implementations • 18 Jun 2022 • Zhengyi Luo, Shun Iwase, Ye Yuan, Kris Kitani
Since 2D third-person observations are coupled with the camera pose, we propose to disentangle the camera pose and use a multi-step projection gradient defined in the global coordinate frame as the movement cue for our embodied agent.
Ranked #324 on 3D Human Pose Estimation on Human3.6M
no code implementations • 5 Apr 2022 • Yuda Song, Ye Yuan, Wen Sun, Kris Kitani
Our theoretical analysis shows that our method is a no-regret algorithm and we provide the convergence rate in the agnostic setting.
7 code implementations • CVPR 2023 • Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani
Instead of relying only on the linear state estimate (i. e., estimation-centric approach), we use object observations (i. e., the measurements by object detector) to compute a virtual trajectory over the occlusion period to fix the error accumulation of filter parameters during the occlusion period.
Ranked #2 on Multiple Object Tracking on CroHD
no code implementations • CVPR 2022 • Rawal Khirodkar, Shashank Tripathi, Kris Kitani
Along with the input image, we condition the top-down model on spatial context from the image in the form of body-center heatmaps.
Ranked #72 on 3D Human Pose Estimation on 3DPW (using extra training data)
no code implementations • CVPR 2022 • Xinshuo Weng, Boris Ivanovic, Kris Kitani, Marco Pavone
This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches.
no code implementations • CVPR 2022 • Yu-Jhe Li, Jinhyung Park, Matthew O'Toole, Kris Kitani
To mitigate this problem, we propose the Self-Training Multimodal Vehicle Detection Network (ST-MVDNet) which leverages a Teacher-Student mutual learning framework and a simulated sensor noise model used in strong data augmentation for Lidar and Radar.
1 code implementation • CVPR 2022 • Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz
Since the joint reconstruction of human motions and camera poses is underconstrained, we propose a global trajectory predictor that generates global human trajectories based on local body movements.
Ranked #1 on Global 3D Human Pose Estimation on EMDB
3 code implementations • CVPR 2022 • Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo
A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.
2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda
To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.
1 code implementation • 21 Oct 2021 • Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le
In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.
8 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • ICLR 2022 • Ye Yuan, Yuda Song, Zhengyi Luo, Wen Sun, Kris Kitani
Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design.
no code implementations • ICCV 2021 • Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani
LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection.
1 code implementation • 8 Jul 2021 • Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani
To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes.
1 code implementation • NeurIPS 2021 • Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani
By comparing the pose instructed by the kinematic model against the pose generated by the dynamics model, we can use their misalignment to further improve the kinematic model.
Egocentric Pose Estimation Human-Object Interaction Detection +2
no code implementations • CVPR 2021 • Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani
We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios, e. g., cameras for construction sites, sports stadiums, and public spaces.
no code implementations • CVPR 2021 • Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, Jason Saragih
Based on this refined kinematic pose, the policy learns to compute dynamics-based control (e. g., joint torques) of the character to advance the current-frame pose estimate to the pose estimate of the next frame.
Ranked #246 on 3D Human Pose Estimation on Human3.6M
2 code implementations • ICCV 2021 • Ye Yuan, Xinshuo Weng, Yanglan Ou, Kris Kitani
Instead, we would prefer a method that allows an agent's state at one time to directly affect another agent's state at a future time.
Ranked #10 on Trajectory Prediction on ETH/UCY
no code implementations • 7 Mar 2021 • Shengcao Cao, Xiaofang Wang, Kris Kitani
Using a sampling-based search algorithm and parallel computing, our method can find an architecture which is better than DARTS and with an 80% reduction in wall-clock search time.
no code implementations • 4 Mar 2021 • Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, Kris Kitani
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics.
no code implementations • 27 Feb 2021 • Harsh Agarwal, Navyata Sanghvi, Vivek Roy, Kris Kitani
Accurate smartphone localization (< 1-meter error) for indoor navigation using only RSSI received from a set of BLE beacons remains a challenging problem, due to the inherent noise of RSSI measurements.
1 code implementation • 8 Feb 2021 • Scott Sun, Dennis Melamed, Kris Kitani
Many smartphone applications use inertial measurement units (IMUs) to sense movement, but the use of these sensors for pedestrian localization can be challenging due to their noise characteristics.
no code implementations • 10 Dec 2020 • Xinshuo Weng, Kris Kitani
Also, this threshold is sensitive to many factors such as target object category so we need to re-search the threshold if these factors change.
1 code implementation • ICCV 2021 • Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani
DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge.
no code implementations • 13 Oct 2020 • Akiyoshi Kurobe, Yoshikatsu Nakajima, Hideo Saito, Kris Kitani
The ability to both recognize and discover terrain characteristics is an important function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots.
no code implementations • 25 Aug 2020 • Xinshuo Weng, Ye Yuan, Kris Kitani
To evaluate this hypothesis, we propose a unified solution for 3D MOT and trajectory forecasting which also incorporates two additional novel computational units.
no code implementations • 22 Aug 2020 • Vivek Roy, Yan Xu, Yu-Xiong Wang, Kris Kitani, Ruslan Salakhutdinov, Martial Hebert
Recent works have proposed to solve this task by augmenting the training data of the few-shot classes using generative models with the few-shot training samples as the seeds.
no code implementations • 20 Aug 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
3D Multi-object tracking (MOT) is crucial to autonomous systems.
no code implementations • 18 Aug 2020 • Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
no code implementations • ECCV 2020 • Mariko Isogawa, Dorian Chan, Ye Yuan, Kris Kitani, Matthew O'Toole
Non-line-of-sight (NLOS) imaging techniques use light that diffusely reflects off of visible surfaces (e. g., walls) to see around corners.
1 code implementation • 27 Jul 2020 • Qiqi Xiao, Jiaxu Zou, Muqiao Yang, Alex Gaudio, Kris Kitani, Asim Smailagic, Pedro Costa, Min Xu
Diabetic Retinopathy (DR) is a leading cause of blindness in working age adults.
1 code implementation • 23 Jun 2020 • Yongxin Wang, Kris Kitani, Xinshuo Weng
Despite the fact that the two components are dependent on each other, prior works often design detection and data association modules separately which are trained with separate objectives.
Ranked #1 on Multi-Object Tracking on 2D MOT 2015
no code implementations • 17 Jun 2020 • Xi Sun, Xinshuo Weng, Kris Kitani
We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU).
1 code implementation • 12 Jun 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
1 code implementation • NeurIPS 2020 • Ye Yuan, Kris Kitani
Our approach is the first humanoid control method that successfully learns from a large-scale human motion dataset (Human3. 6M) and generates diverse long-term motions.
no code implementations • 6 Jun 2020 • S. Alireza Golestaneh, Kris Kitani
In our experiments, we demonstrate that by utilizing multi-task learning and our proposed feature fusion method, our model yields better performance for the NR-IQA task.
1 code implementation • CVPR 2020 • Mariko Isogawa, Ye Yuan, Matthew O'Toole, Kris Kitani
We bring together a diverse set of technologies from NLOS imaging, human pose estimation and deep reinforcement learning to construct an end-to-end data processing pipeline that converts a raw stream of photon measurements into a full 3D human pose sequence estimate.
1 code implementation • ECCV 2020 • Ye Yuan, Kris Kitani
To obtain samples from a pretrained generative model, most existing generative human motion prediction methods draw a set of independent Gaussian latent codes and convert them to motion samples.
Ranked #1 on Human Pose Forecasting on AMASS (APD metric)
no code implementations • 18 Mar 2020 • Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, Nicholas Rhinehart
Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data.
no code implementations • 17 Mar 2020 • Xinshuo Weng, Ye Yuan, Kris Kitani
We evaluate on KITTI and nuScenes datasets showing that our method with socially-aware feature learning and diversity sampling achieves new state-of-the-art performance on 3D MOT and trajectory prediction.
no code implementations • 12 Dec 2019 • Yan Xu, Vivek Roy, Kris Kitani
We propose an alternative strategy for extracting 3D information to solve for camera pose by using pedestrian trajectories.
no code implementations • ICCV 2019 • Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani
This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time.
no code implementations • ICLR 2020 • Ye Yuan, Kris Kitani
To learn the parameters of the DSF, the diversity of the trajectory samples is evaluated by a diversity loss based on a determinantal point process (DPP).
Ranked #5 on Human Pose Forecasting on HumanEva-I
1 code implementation • 9 Jul 2019 • Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
Ranked #16 on Multiple Object Tracking on KITTI Test (Online Methods)
1 code implementation • ICCV 2019 • Ye Yuan, Kris Kitani
We propose the use of a proportional-derivative (PD) control based policy learned via reinforcement learning (RL) to estimate and forecast 3D human pose from egocentric videos.
1 code implementation • 24 May 2019 • J. D. Curtó, I. C. Zarza, Kris Kitani, Irwin King, Michael R. Lyu
Dr. of Crosswise proposes a new architecture to reduce over-parametrization in Neural Networks.
no code implementations • 4 May 2019 • Xinshuo Weng, Kris Kitani
We evaluate different combinations of front-end and back-end modules with the grayscale video and optical flow inputs on the LRW dataset.
2 code implementations • ICCV 2019 • Nicholas Rhinehart, Rowan Mcallister, Kris Kitani, Sergey Levine
For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information.
1 code implementation • 23 Mar 2019 • Xinshuo Weng, Kris Kitani
Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal.
1 code implementation • 4 Mar 2019 • Navyata Sanghvi, Ryo Yonetani, Kris Kitani
Toward enabling next-generation robots capable of socially intelligent interaction with humans, we present a $\mathbf{computational\; model}$ of interactions in a social environment of multiple agents and multiple groups.
no code implementations • 17 Nov 2018 • Yunze Man, Xinshuo Weng, Xi Li, Kris Kitani
We focus on estimating the 3D orientation of the ground plane from a single image.
no code implementations • ECCV 2018 • Ye Yuan, Kris Kitani
Motivated by this, we propose a novel control-based approach to model human motion with physics simulation and use imitation learning to learn a video-conditioned control policy for ego-pose estimation.
no code implementations • 22 Jul 2018 • Minjie Cai, Kris Kitani, Yoichi Sato
In the proposed model, we explore various semantic relationships between actions, grasp types and object attributes, and show how the context can be used to boost the recognition of each component.
no code implementations • 11 Apr 2018 • Eshed Ohn-Bar, Kris Kitani, Chieko Asakawa
Consider an assistive system that guides visually impaired users through speech and haptic feedback to their destination.
Model-based Reinforcement Learning Reinforcement Learning +1
no code implementations • 19 Jun 2017 • Xinshuo Weng, Shangxuan Wu, Fares Beainy, Kris Kitani
To address this issue, we propose a Rotational Rectification Network (R2N) that can be inserted into any CNN-based pedestrian (or object) detector to adapt it to significant changes in camera rotation.
no code implementations • 15 Dec 2016 • Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade
We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations.