no code implementations • ECCV 2020 • Wen-Hsuan Chu, Kris M. Kitani
In this work, our key hypothesis is that this change in loss values during training can be used as a feature to identify anomalous data.
no code implementations • 7 Nov 2024 • Rohan Choudhury, Guanglei Zhu, Sihan Liu, Koichiro Niinuma, Kris M. Kitani, László Jeni
Our method is content-aware, requiring no tuning for different datasets, and fast, incurring negligible overhead.
no code implementations • 1 Dec 2023 • Rohan Choudhury, Koichiro Niinuma, Kris M. Kitani, László A. Jeni
We propose to answer zero-shot questions about videos by generating short procedural programs that derive a final answer from solving a sequence of visual subtasks.
Ranked #15 on
Zero-Shot Video Question Answer
on NExT-QA
no code implementations • ICCV 2023 • Qichen Fu, Xingyu Liu, ran Xu, Juan Carlos Niebles, Kris M. Kitani
Accurately estimating 3D hand pose is crucial for understanding how humans interact with the world.
no code implementations • 8 Dec 2022 • Xingyu Liu, Deepak Pathak, Kris M. Kitani
The ability to learn from human demonstration endows robots with the ability to automate various tasks.
no code implementations • 18 Jun 2022 • Zhengyi Luo, Ye Yuan, Kris M. Kitani
Second, we use a design-and-control framework to optimize a humanoid's physical attributes to find body designs that can better imitate the pre-specified human motion sequence(s).
no code implementations • 2 May 2022 • Xiaofang Wang, Kris M. Kitani
While progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e. g., inference latency.
no code implementations • 16 Mar 2022 • Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato
We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).
1 code implementation • 10 Feb 2022 • Xingyu Liu, Deepak Pathak, Kris M. Kitani
We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters.
no code implementations • 7 Nov 2021 • Xingyu Liu, Kris M. Kitani
Manipulating articulated objects requires multiple robot arms in general.
1 code implementation • CVPR 2022 • Qichen Fu, Xingyu Liu, Kris M. Kitani
While our voting function is able to improve the bounding box of the active object, one round of voting is typically not enough to accurately localize the active object.
no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda
This enables the student model to capture domain-invariant features.
no code implementations • ICCV 2021 • Xingyu Liu, Shun Iwase, Kris M. Kitani
We present a large-scale stereo RGB image object pose estimation dataset named the $\textbf{StereOBJ-1M}$ dataset.
no code implementations • 21 Sep 2021 • Xingyu Liu, Shun Iwase, Kris M. Kitani
To address this problem, we propose a novel continuous representation called Keypoint Distance Field (KDF) for projected 2D keypoint locations.
1 code implementation • 16 Aug 2021 • S. Alireza Golestaneh, Saba Dadsetan, Kris M. Kitani
Specifically, we enforce self-consistency between the outputs of our quality assessment model for each image and its transformation (horizontally flipped) to utilize the rich self-supervisory information and reduce the uncertainty of the model.
Ranked #3 on
No-Reference Image Quality Assessment
on TID2013
no code implementations • 13 May 2021 • Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani
To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.
1 code implementation • ICCV 2021 • Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani
Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in.
Ranked #5 on
6D Pose Estimation using RGB
on LineMOD
no code implementations • ICCV 2021 • Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani
We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.
no code implementations • ICLR 2022 • Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Alon, Elad Eban
Committee-based models (ensembles or cascades) construct models by combining existing pre-trained ones.
no code implementations • 10 Nov 2020 • Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Shun Iwase, Kris M. Kitani
We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation using a head-mounted camera.
2 code implementations • 9 Aug 2020 • Zhengyi Luo, S. Alireza Golestaneh, Kris M. Kitani
Experiments show that our method produces both smooth and accurate 3D human pose and motion estimates.
Ranked #16 on
3D Human Pose Estimation
on 3DPW
(Acceleration Error metric, using extra
training data)
no code implementations • 4 Aug 2020 • S. Alireza Golestaneh, Kris M. Kitani
We address the task of active learning in the context of semantic segmentation and show that self-consistency can be a powerful source of self-supervision to greatly improve the performance of a data-driven model with access to only a small amount of labeled data.
no code implementations • ECCV 2020 • Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua
The discovered attention cells can be seamlessly inserted into existing backbone networks, e. g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets.
no code implementations • 16 Mar 2020 • Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani
To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.
no code implementations • 15 May 2019 • Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, Kai O. Arras
With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important.
no code implementations • CVPR 2020 • Jiaqi Guan, Ye Yuan, Kris M. Kitani, Nicholas Rhinehart
Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems.
1 code implementation • 21 Mar 2019 • Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani
Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision.
Robotics
2 code implementations • ICLR 2019 • Shengcao Cao, Xiaofang Wang, Kris M. Kitani
We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.
no code implementations • 3 Dec 2018 • Rawal Khirodkar, Kris M. Kitani
Domain Randomization (DR) is known to require a significant amount of training data for good performance.
1 code implementation • 14 Nov 2018 • Rawal Khirodkar, Donghyun Yoo, Kris M. Kitani
We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator.
no code implementations • ICLR 2019 • Arjun Sharma, Mohit Sharma, Nicholas Rhinehart, Kris M. Kitani
The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging.
no code implementations • ECCV 2018 • Nicholas Rhinehart, Kris M. Kitani, Paul Vernaza
We propose a method to forecast a vehicle's ego-motion as a distribution over spatiotemporal paths, conditioned on features (e. g., from LIDAR and images) embedded in an overhead map.
no code implementations • 6 Aug 2018 • Xiang Xu, Xiaofang Wang, Kris M. Kitani
We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network.
no code implementations • 22 Jun 2018 • Xinlei Pan, Eshed Ohn-Bar, Nicholas Rhinehart, Yan Xu, Yilin Shen, Kris M. Kitani
The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states.
no code implementations • 20 Jun 2018 • Tanmay Shankar, Nicholas Rhinehart, Katharina Muelling, Kris M. Kitani
We introduce a novel deterministic policy gradient update, DRAG (i. e., DeteRministically AGgrevate) in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural parser.
no code implementations • 21 Oct 2017 • Sima Behpour, Kris M. Kitani, Brian D. Ziebart
We aim to find an optimal adversarial perturbations of the ground truth data (i. e., the worst case perturbations) that forces the object bounding box predictor to learn from the hardest distribution of perturbed examples for better test-time performance.
no code implementations • 6 Oct 2017 • Donghyun Yoo, Haoqi Fan, Vishnu Naresh Boddeti, Kris M. Kitani
To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer.
no code implementations • NeurIPS 2017 • Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell
We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.
no code implementations • 22 Sep 2017 • Mohit Sharma, Kris M. Kitani, Joachim Groeger
We make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL).
no code implementations • ICLR 2018 • Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, Kris M. Kitani
Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network.
no code implementations • ICCV 2017 • Ryo Yonetani, Vishnu Naresh Boddeti, Kris M. Kitani, Yoichi Sato
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data.
no code implementations • ICCV 2017 • Nicholas Rhinehart, Kris M. Kitani
We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they seek.
1 code implementation • 12 Dec 2016 • Xiaofang Wang, Yi Shi, Kris M. Kitani
The current state-of-the-art deep hashing method DPSH~\cite{li2015feature}, which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities.
no code implementations • 9 Dec 2016 • Yubo Zhang, Vishnu Naresh Boddeti, Kris M. Kitani
Concretely, our approach uses two convolutional neural networks: (1) a gesture network that uses pre-defined motion information to detect the hand region; and (2) an appearance network that learns a person specific model of the hand region based on the output of the gesture network.
no code implementations • 8 Dec 2016 • Xiaofang Wang, Kris M. Kitani, Martial Hebert
Given a query image, a second positive image and a third negative image, dissimilar to the first two images, we define a contextualized similarity search criteria.
no code implementations • 1 Dec 2016 • Jonathan Shen, Noranart Vesdapunt, Vishnu N. Boddeti, Kris M. Kitani
It has been observed that many of the parameters of a large network are redundant, allowing for the possibility of learning a smaller network that mimics the outputs of the large network through a process called Knowledge Distillation.
no code implementations • 15 Jun 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.
no code implementations • CVPR 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.
no code implementations • CVPR 2016 • Minghuang Ma, Haoqi Fan, Kris M. Kitani
Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects.
no code implementations • CVPR 2016 • Nicholas Rhinehart, Kris M. Kitani
When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment.
no code implementations • CVPR 2017 • Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani
We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.
no code implementations • CVPR 2015 • Hironori Hattori, Vishnu Naresh Boddeti, Kris M. Kitani, Takeo Kanade
Our results also yield a surprising result, that our method using purely synthetic data is able to outperform models trained on real scene-specific data when data is limited.
no code implementations • CVPR 2015 • De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani
Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.
no code implementations • CVPR 2015 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos.
no code implementations • CVPR 2013 • Cheng Li, Kris M. Kitani
Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination.