Search Results for author: Kris M. Kitani

Found 54 papers, 9 papers with code

Neural Batch Sampling with Reinforcement Learning for Semi-Supervised Anomaly Detection

no code implementations • ECCV 2020 • Wen-Hsuan Chu, Kris M. Kitani

In this work, our key hypothesis is that this change in loss values during training can be used as a feature to identify anomalous data.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Zero-Shot Video Question Answering with Procedural Programs

no code implementations • 1 Dec 2023 • Rohan Choudhury, Koichiro Niinuma, Kris M. Kitani, László A. Jeni

We propose to answer zero-shot questions about videos by generating short procedural programs that derive a final answer from solving a sequence of visual subtasks.

Ranked #5 on Zero-Shot Video Question Answer on NExT-QA

Code Generation Language Modelling +6

Paper
Add Code

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

no code implementations • ICCV 2023 • Qichen Fu, Xingyu Liu, ran Xu, Juan Carlos Niebles, Kris M. Kitani

Accurately estimating 3D hand pose is crucial for understanding how humans interact with the world.

Hand Pose Estimation

Paper
Add Code

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration

no code implementations • 8 Dec 2022 • Xingyu Liu, Deepak Pathak, Kris M. Kitani

The ability to learn from human demonstration endows robots with the ability to automate various tasks.

Paper
Add Code

From Universal Humanoid Control to Automatic Physically Valid Character Creation

no code implementations • 18 Jun 2022 • Zhengyi Luo, Ye Yuan, Kris M. Kitani

Second, we use a design-and-control framework to optimize a humanoid's physical attributes to find body designs that can better imitate the pre-specified human motion sequence(s).

Humanoid Control valid

Paper
Add Code

Cost-Aware Evaluation and Model Scaling for LiDAR-Based 3D Object Detection

no code implementations • 2 May 2022 • Xiaofang Wang, Kris M. Kitani

While progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e. g., inference latency.

3D Object Detection object-detection

Paper
Add Code

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

no code implementations • 16 Mar 2022 • Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato

We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).

Domain Adaptation Knowledge Distillation

Paper
Add Code

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

1 code implementation • 10 Feb 2022 • Xingyu Liu, Deepak Pathak, Kris M. Kitani

We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters.

Imitation Learning

Paper
Code

V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects

no code implementations • 7 Nov 2021 • Xingyu Liu, Kris M. Kitani

Manipulating articulated objects requires multiple robot arms in general.

Object

Paper
Add Code

Sequential Voting with Relational Box Fields for Active Object Detection

1 code implementation • CVPR 2022 • Qichen Fu, Xingyu Liu, Kris M. Kitani

While our voting function is able to improve the bounding box of the active object, one round of voting is typically not enough to accurately localize the active object.

Active Object Detection Imitation Learning +4

Paper
Code

Adaptive Unbiased Teacher for Cross-Domain Object Detection

no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda

This enables the student model to capture domain-invariant features.

Data Augmentation Domain Adaptation +3

Paper
Add Code

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

no code implementations • ICCV 2021 • Xingyu Liu, Shun Iwase, Kris M. Kitani

We present a large-scale stereo RGB image object pose estimation dataset named the $\textbf{StereOBJ-1M}$ dataset.

6D Pose Estimation using RGB Object +1

Paper
Add Code

KDFNet: Learning Keypoint Distance Field for 6D Object Pose Estimation

no code implementations • 21 Sep 2021 • Xingyu Liu, Shun Iwase, Kris M. Kitani

To address this problem, we propose a novel continuous representation called Keypoint Distance Field (KDF) for projected 2D keypoint locations.

6D Pose Estimation using RGB

Paper
Add Code

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

1 code implementation • 16 Aug 2021 • S. Alireza Golestaneh, Saba Dadsetan, Kris M. Kitani

Specifically, we enforce self-consistency between the outputs of our quality assessment model for each image and its transformation (horizontally flipped) to utilize the rich self-supervisory information and reduce the uncertainty of the model.

Ranked #3 on No-Reference Image Quality Assessment on TID2013

No-Reference Image Quality Assessment NR-IQA +1

133

Paper
Code

Neighborhood-Aware Neural Architecture Search

no code implementations • 13 May 2021 • Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani

To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.

Neural Architecture Search

Paper
Add Code

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering

1 code implementation • ICCV 2021 • Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani

Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in.

Ranked #5 on 6D Pose Estimation using RGB on LineMOD

6D Pose Estimation 6D Pose Estimation using RGB +1

Paper
Code

Visio-Temporal Attention for Multi-Camera Multi-Target Association

no code implementations • ICCV 2021 • Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani

We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.

Paper
Add Code

Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models

no code implementations • ICLR 2022 • Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Alon, Elad Eban

Committee-based models (ensembles or cascades) construct models by combining existing pre-trained ones.

General Classification Image Classification +3

Paper
Add Code

Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation

no code implementations • 10 Nov 2020 • Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Shun Iwase, Kris M. Kitani

We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation using a head-mounted camera.

Human-Object Interaction Detection Object +4

Paper
Add Code

3D Human Motion Estimation via Motion Compression and Refinement

2 code implementations • 9 Aug 2020 • Zhengyi Luo, S. Alireza Golestaneh, Kris M. Kitani

Experiments show that our method produces both smooth and accurate 3D human pose and motion estimates.

Ranked #14 on 3D Human Pose Estimation on 3DPW (Acceleration Error metric, using extra training data)

3D Human Pose Estimation

104

Paper
Code

Importance of Self-Consistency in Active Learning for Semantic Segmentation

no code implementations • 4 Aug 2020 • S. Alireza Golestaneh, Kris M. Kitani

We address the task of active learning in the context of semantic segmentation and show that self-consistency can be a powerful source of self-supervision to greatly improve the performance of a data-driven model with access to only a small amount of labeled data.

Active Learning Segmentation +1

Paper
Add Code

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

no code implementations • ECCV 2020 • Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

The discovered attention cells can be seamlessly inserted into existing backbone networks, e. g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets.

Classification General Classification +1

Paper
Add Code

Learning Shape Representations for Clothing Variations in Person Re-Identification

no code implementations • 16 Mar 2020 • Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani

To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.

Disentanglement Person Re-Identification

Paper
Add Code

Human Motion Trajectory Prediction: A Survey

no code implementations • 15 May 2019 • Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, Kai O. Arras

With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important.

Trajectory Prediction

Paper
Add Code

Generative Hybrid Representations for Activity Forecasting with No-Regret Learning

no code implementations • CVPR 2020 • Jiaqi Guan, Ye Yuan, Kris M. Kitani, Nicholas Rhinehart

Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems.

Paper
Add Code

Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges

1 code implementation • 21 Mar 2019 • Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani

Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision.

Robotics

Paper
Code

Learnable Embedding Space for Efficient Neural Architecture Compression

2 code implementations • ICLR 2019 • Shengcao Cao, Xiaofang Wang, Kris M. Kitani

We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.

Bayesian Optimization Neural Architecture Search

Paper
Code

Adversarial Domain Randomization

no code implementations • 3 Dec 2018 • Rawal Khirodkar, Kris M. Kitani

Domain Randomization (DR) is known to require a significant amount of training data for good performance.

Domain Adaptation Image Classification +2

Paper
Add Code

Domain Randomization for Scene-Specific Car Detection and Pose Estimation

1 code implementation • 14 Nov 2018 • Rawal Khirodkar, Donghyun Yoo, Kris M. Kitani

We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator.

3D Pose Estimation Instance Segmentation +1

Paper
Code

Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

no code implementations • ICLR 2019 • Arjun Sharma, Mohit Sharma, Nicholas Rhinehart, Kris M. Kitani

The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging.

Imitation Learning

Paper
Add Code

R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting

no code implementations • ECCV 2018 • Nicholas Rhinehart, Kris M. Kitani, Paul Vernaza

We propose a method to forecast a vehicle's ego-motion as a distribution over spatiotemporal paths, conditioned on features (e. g., from LIDAR and images) embedded in an overhead map.

Paper
Add Code

Error Correction Maximization for Deep Image Hashing

no code implementations • 6 Aug 2018 • Xiang Xu, Xiaofang Wang, Kris M. Kitani

We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network.

Paper
Add Code

Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

no code implementations • 22 Jun 2018 • Xinlei Pan, Eshed Ohn-Bar, Nicholas Rhinehart, Yan Xu, Yilin Shen, Kris M. Kitani

The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

no code implementations • 20 Jun 2018 • Tanmay Shankar, Nicholas Rhinehart, Katharina Muelling, Kris M. Kitani

We introduce a novel deterministic policy gradient update, DRAG (i. e., DeteRministically AGgrevate) in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural parser.

Imitation Learning

Paper
Add Code

ADA: A Game-Theoretic Perspective on Data Augmentation for Object Detection

no code implementations • 21 Oct 2017 • Sima Behpour, Kris M. Kitani, Brian D. Ziebart

We aim to find an optimal adversarial perturbations of the ground truth data (i. e., the worst case perturbations) that forces the object bounding box predictor to learn from the hardest distribution of perturbed examples for better test-time performance.

Data Augmentation Object +3

Paper
Add Code

Efficient K-Shot Learning with Regularized Deep Networks

no code implementations • 6 Oct 2017 • Donghyun Yoo, Haoqi Fan, Vishnu Naresh Boddeti, Kris M. Kitani

To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer.

Paper
Add Code

Predictive-State Decoders: Encoding the Future into Recurrent Networks

no code implementations • NeurIPS 2017 • Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.

Imitation Learning

Paper
Add Code

Inverse Reinforcement Learning with Conditional Choice Probabilities

no code implementations • 22 Sep 2017 • Mohit Sharma, Kris M. Kitani, Joachim Groeger

We make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL).

Econometrics reinforcement-learning +1

Paper
Add Code

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

no code implementations • ICLR 2018 • Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, Kris M. Kitani

Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network.

Model Compression reinforcement-learning +2

Paper
Add Code

Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption

no code implementations • ICCV 2017 • Ryo Yonetani, Vishnu Naresh Boddeti, Kris M. Kitani, Yoichi Sato

We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data.

Privacy Preserving

Paper
Add Code

First-Person Activity Forecasting with Online Inverse Reinforcement Learning

no code implementations • ICCV 2017 • Nicholas Rhinehart, Kris M. Kitani

We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they seek.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Deep Supervised Hashing with Triplet Labels

1 code implementation • 12 Dec 2016 • Xiaofang Wang, Yi Shi, Kris M. Kitani

The current state-of-the-art deep hashing method DPSH~\cite{li2015feature}, which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities.

Deep Hashing Image Retrieval

Paper
Code

Gesture-based Bootstrapping for Egocentric Hand Segmentation

no code implementations • 9 Dec 2016 • Yubo Zhang, Vishnu Naresh Boddeti, Kris M. Kitani

Concretely, our approach uses two convolutional neural networks: (1) a gesture network that uses pre-defined motion information to detect the hand region; and (2) an appearance network that learns a person specific model of the hand region based on the output of the gesture network.

Hand Segmentation

Paper
Add Code

Contextual Visual Similarity

no code implementations • 8 Dec 2016 • Xiaofang Wang, Kris M. Kitani, Martial Hebert

Given a query image, a second positive image and a third negative image, dissimilar to the first two images, we define a contextualized similarity search criteria.

Attribute Image Retrieval +1

Paper
Add Code

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

no code implementations • 1 Dec 2016 • Jonathan Shen, Noranart Vesdapunt, Vishnu N. Boddeti, Kris M. Kitani

It has been observed that many of the parameters of a large network are redundant, allowing for the possibility of learning a smaller network that mimics the outputs of the large network through a process called Knowledge Distillation.

Knowledge Distillation Pedestrian Detection

Paper
Add Code

Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures

no code implementations • 15 Jun 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.

Clustering Retrieval +1

Paper
Add Code

Recognizing Micro-Actions and Reactions From Paired Egocentric Videos

no code implementations • CVPR 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.

Video Summarization

Paper
Add Code

Going Deeper into First-Person Activity Recognition

no code implementations • CVPR 2016 • Minghuang Ma, Haoqi Fan, Kris M. Kitani

Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects.

Action Recognition Object +1

Paper
Add Code

Learning Action Maps of Large Environments via First-Person Vision

no code implementations • CVPR 2016 • Nicholas Rhinehart, Kris M. Kitani

When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment.

Paper
Add Code

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

no code implementations • CVPR 2017 • Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.

Decision Making

Paper
Add Code

Learning Scene-Specific Pedestrian Detectors Without Real Data

no code implementations • CVPR 2015 • Hironori Hattori, Vishnu Naresh Boddeti, Kris M. Kitani, Takeo Kanade

Our results also yield a surprising result, that our method using purely synthetic data is able to outperform models trained on real scene-specific data when data is limited.

Pedestrian Detection

Paper
Add Code

How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps

no code implementations • CVPR 2015 • De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani

Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.

Clustering Online Clustering

Paper
Add Code

Ego-Surfing First-Person Videos

no code implementations • CVPR 2015 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos.

Paper
Add Code

Pixel-Level Hand Detection in Ego-centric Videos

no code implementations • CVPR 2013 • Cheng Li, Kris M. Kitani

Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination.

Hand Detection Sign Language Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.