Search Results for author: Kris M. Kitani

Found 54 papers, 9 papers with code

Zero-Shot Video Question Answering with Procedural Programs

no code implementations1 Dec 2023 Rohan Choudhury, Koichiro Niinuma, Kris M. Kitani, László A. Jeni

We propose to answer zero-shot questions about videos by generating short procedural programs that derive a final answer from solving a sequence of visual subtasks.

Code Generation Language Modelling +6

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration

no code implementations8 Dec 2022 Xingyu Liu, Deepak Pathak, Kris M. Kitani

The ability to learn from human demonstration endows robots with the ability to automate various tasks.

From Universal Humanoid Control to Automatic Physically Valid Character Creation

no code implementations18 Jun 2022 Zhengyi Luo, Ye Yuan, Kris M. Kitani

Second, we use a design-and-control framework to optimize a humanoid's physical attributes to find body designs that can better imitate the pre-specified human motion sequence(s).

Humanoid Control valid

Cost-Aware Evaluation and Model Scaling for LiDAR-Based 3D Object Detection

no code implementations2 May 2022 Xiaofang Wang, Kris M. Kitani

While progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e. g., inference latency.

3D Object Detection object-detection

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

no code implementations16 Mar 2022 Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato

We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).

Domain Adaptation Knowledge Distillation

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

1 code implementation10 Feb 2022 Xingyu Liu, Deepak Pathak, Kris M. Kitani

We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters.

Imitation Learning

Sequential Voting with Relational Box Fields for Active Object Detection

1 code implementation CVPR 2022 Qichen Fu, Xingyu Liu, Kris M. Kitani

While our voting function is able to improve the bounding box of the active object, one round of voting is typically not enough to accurately localize the active object.

Active Object Detection Imitation Learning +4

KDFNet: Learning Keypoint Distance Field for 6D Object Pose Estimation

no code implementations21 Sep 2021 Xingyu Liu, Shun Iwase, Kris M. Kitani

To address this problem, we propose a novel continuous representation called Keypoint Distance Field (KDF) for projected 2D keypoint locations.

6D Pose Estimation using RGB

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

1 code implementation16 Aug 2021 S. Alireza Golestaneh, Saba Dadsetan, Kris M. Kitani

Specifically, we enforce self-consistency between the outputs of our quality assessment model for each image and its transformation (horizontally flipped) to utilize the rich self-supervisory information and reduce the uncertainty of the model.

No-Reference Image Quality Assessment NR-IQA +1

Neighborhood-Aware Neural Architecture Search

no code implementations13 May 2021 Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani

To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.

Neural Architecture Search

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering

1 code implementation ICCV 2021 Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani

Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in.

6D Pose Estimation 6D Pose Estimation using RGB +1

Visio-Temporal Attention for Multi-Camera Multi-Target Association

no code implementations ICCV 2021 Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani

We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.

Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation

no code implementations10 Nov 2020 Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Shun Iwase, Kris M. Kitani

We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation using a head-mounted camera.

Human-Object Interaction Detection Object +4

3D Human Motion Estimation via Motion Compression and Refinement

2 code implementations9 Aug 2020 Zhengyi Luo, S. Alireza Golestaneh, Kris M. Kitani

Experiments show that our method produces both smooth and accurate 3D human pose and motion estimates.

Ranked #14 on 3D Human Pose Estimation on 3DPW (Acceleration Error metric, using extra training data)

3D Human Pose Estimation

Importance of Self-Consistency in Active Learning for Semantic Segmentation

no code implementations4 Aug 2020 S. Alireza Golestaneh, Kris M. Kitani

We address the task of active learning in the context of semantic segmentation and show that self-consistency can be a powerful source of self-supervision to greatly improve the performance of a data-driven model with access to only a small amount of labeled data.

Active Learning Segmentation +1

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

no code implementations ECCV 2020 Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

The discovered attention cells can be seamlessly inserted into existing backbone networks, e. g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets.

Classification General Classification +1

Learning Shape Representations for Clothing Variations in Person Re-Identification

no code implementations16 Mar 2020 Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani

To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.

Disentanglement Person Re-Identification

Human Motion Trajectory Prediction: A Survey

no code implementations15 May 2019 Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, Kai O. Arras

With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important.

Trajectory Prediction

Generative Hybrid Representations for Activity Forecasting with No-Regret Learning

no code implementations CVPR 2020 Jiaqi Guan, Ye Yuan, Kris M. Kitani, Nicholas Rhinehart

Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems.

Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges

1 code implementation21 Mar 2019 Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani

Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision.

Robotics

Learnable Embedding Space for Efficient Neural Architecture Compression

2 code implementations ICLR 2019 Shengcao Cao, Xiaofang Wang, Kris M. Kitani

We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.

Bayesian Optimization Neural Architecture Search

Adversarial Domain Randomization

no code implementations3 Dec 2018 Rawal Khirodkar, Kris M. Kitani

Domain Randomization (DR) is known to require a significant amount of training data for good performance.

Domain Adaptation Image Classification +2

Domain Randomization for Scene-Specific Car Detection and Pose Estimation

1 code implementation14 Nov 2018 Rawal Khirodkar, Donghyun Yoo, Kris M. Kitani

We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator.

3D Pose Estimation Instance Segmentation +1

Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

no code implementations ICLR 2019 Arjun Sharma, Mohit Sharma, Nicholas Rhinehart, Kris M. Kitani

The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging.

Imitation Learning

R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting

no code implementations ECCV 2018 Nicholas Rhinehart, Kris M. Kitani, Paul Vernaza

We propose a method to forecast a vehicle's ego-motion as a distribution over spatiotemporal paths, conditioned on features (e. g., from LIDAR and images) embedded in an overhead map.

Error Correction Maximization for Deep Image Hashing

no code implementations6 Aug 2018 Xiang Xu, Xiaofang Wang, Kris M. Kitani

We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network.

Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

no code implementations22 Jun 2018 Xinlei Pan, Eshed Ohn-Bar, Nicholas Rhinehart, Yan Xu, Yilin Shen, Kris M. Kitani

The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states.

reinforcement-learning Reinforcement Learning (RL)

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

no code implementations20 Jun 2018 Tanmay Shankar, Nicholas Rhinehart, Katharina Muelling, Kris M. Kitani

We introduce a novel deterministic policy gradient update, DRAG (i. e., DeteRministically AGgrevate) in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural parser.

Imitation Learning

ADA: A Game-Theoretic Perspective on Data Augmentation for Object Detection

no code implementations21 Oct 2017 Sima Behpour, Kris M. Kitani, Brian D. Ziebart

We aim to find an optimal adversarial perturbations of the ground truth data (i. e., the worst case perturbations) that forces the object bounding box predictor to learn from the hardest distribution of perturbed examples for better test-time performance.

Data Augmentation Object +3

Efficient K-Shot Learning with Regularized Deep Networks

no code implementations6 Oct 2017 Donghyun Yoo, Haoqi Fan, Vishnu Naresh Boddeti, Kris M. Kitani

To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer.

Predictive-State Decoders: Encoding the Future into Recurrent Networks

no code implementations NeurIPS 2017 Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.

Imitation Learning

Inverse Reinforcement Learning with Conditional Choice Probabilities

no code implementations22 Sep 2017 Mohit Sharma, Kris M. Kitani, Joachim Groeger

We make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL).

Econometrics reinforcement-learning +1

First-Person Activity Forecasting with Online Inverse Reinforcement Learning

no code implementations ICCV 2017 Nicholas Rhinehart, Kris M. Kitani

We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they seek.

reinforcement-learning Reinforcement Learning (RL) +1

Deep Supervised Hashing with Triplet Labels

1 code implementation12 Dec 2016 Xiaofang Wang, Yi Shi, Kris M. Kitani

The current state-of-the-art deep hashing method DPSH~\cite{li2015feature}, which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities.

Deep Hashing Image Retrieval

Gesture-based Bootstrapping for Egocentric Hand Segmentation

no code implementations9 Dec 2016 Yubo Zhang, Vishnu Naresh Boddeti, Kris M. Kitani

Concretely, our approach uses two convolutional neural networks: (1) a gesture network that uses pre-defined motion information to detect the hand region; and (2) an appearance network that learns a person specific model of the hand region based on the output of the gesture network.

Hand Segmentation

Contextual Visual Similarity

no code implementations8 Dec 2016 Xiaofang Wang, Kris M. Kitani, Martial Hebert

Given a query image, a second positive image and a third negative image, dissimilar to the first two images, we define a contextualized similarity search criteria.

Attribute Image Retrieval +1

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

no code implementations1 Dec 2016 Jonathan Shen, Noranart Vesdapunt, Vishnu N. Boddeti, Kris M. Kitani

It has been observed that many of the parameters of a large network are redundant, allowing for the possibility of learning a smaller network that mimics the outputs of the large network through a process called Knowledge Distillation.

Knowledge Distillation Pedestrian Detection

Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures

no code implementations15 Jun 2016 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.

Clustering Retrieval +1

Recognizing Micro-Actions and Reactions From Paired Egocentric Videos

no code implementations CVPR 2016 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.

Video Summarization

Going Deeper into First-Person Activity Recognition

no code implementations CVPR 2016 Minghuang Ma, Haoqi Fan, Kris M. Kitani

Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects.

Action Recognition Object +1

Learning Action Maps of Large Environments via First-Person Vision

no code implementations CVPR 2016 Nicholas Rhinehart, Kris M. Kitani

When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment.

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

no code implementations CVPR 2017 Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.

Decision Making

Learning Scene-Specific Pedestrian Detectors Without Real Data

no code implementations CVPR 2015 Hironori Hattori, Vishnu Naresh Boddeti, Kris M. Kitani, Takeo Kanade

Our results also yield a surprising result, that our method using purely synthetic data is able to outperform models trained on real scene-specific data when data is limited.

Pedestrian Detection

How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps

no code implementations CVPR 2015 De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani

Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.

Clustering Online Clustering

Ego-Surfing First-Person Videos

no code implementations CVPR 2015 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos.

Pixel-Level Hand Detection in Ego-centric Videos

no code implementations CVPR 2013 Cheng Li, Kris M. Kitani

Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination.

Hand Detection Sign Language Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.