Search Results for author: Xinshuo Weng

Found 32 papers, 13 papers with code

MTP: Multi-Hypothesis Tracking and Prediction for Reduced Error Propagation

1 code implementation18 Oct 2021 Xinshuo Weng, Boris Ivanovic, Marco Pavone

Recently, there has been tremendous progress in developing each individual module of the standard perception-planning robot autonomy pipeline, including detection, tracking, prediction of other agents' trajectories, and ego-agent trajectory planning.

Trajectory Planning

Multi-Echo LiDAR for 3D Object Detection

no code implementations ICCV 2021 Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani

LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection.

3D Object Detection object-detection

Multi-Modality Task Cascade for 3D Object Detection

1 code implementation8 Jul 2021 Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes.

3D Object Detection object-detection

Wide-Baseline Multi-Camera Calibration using Person Re-Identification

no code implementations CVPR 2021 Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani

We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios, e. g., cameras for construction sites, sports stadiums, and public spaces.

Camera Calibration Person Re-Identification

Supervision by Registration and Triangulation for Landmark Detection

1 code implementation25 Jan 2021 Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu

End-to-end training is made possible by differentiable registration and 3D triangulation modules.

Optical Flow Estimation

Visio-Temporal Attention for Multi-Camera Multi-Target Association

no code implementations ICCV 2021 Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani

We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.

AutoSelect: Automatic and Dynamic Detection Selection for 3D Multi-Object Tracking

no code implementations10 Dec 2020 Xinshuo Weng, Kris Kitani

Also, this threshold is sensitive to many factors such as target object category so we need to re-search the threshold if these factors change.

3D Multi-Object Tracking

End-to-End 3D Multi-Object Tracking and Trajectory Forecasting

no code implementations25 Aug 2020 Xinshuo Weng, Ye Yuan, Kris Kitani

To evaluate this hypothesis, we propose a unified solution for 3D MOT and trajectory forecasting which also incorporates two additional novel computational units.

3D Multi-Object Tracking Trajectory Forecasting

AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics

no code implementations18 Aug 2020 Xinshuo Weng, Jianren Wang, David Held, Kris Kitani

Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.

3D Multi-Object Tracking Autonomous Driving

Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

1 code implementation23 Jun 2020 Yongxin Wang, Kris Kitani, Xinshuo Weng

Despite the fact that the two components are dependent on each other, prior works often design detection and data association modules separately which are trained with separate objectives.

Multi-Object Tracking object-detection +1

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous

no code implementations17 Jun 2020 Xi Sun, Xinshuo Weng, Kris Kitani

We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU).

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

1 code implementation12 Jun 2020 Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani

As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.

3D Multi-Object Tracking

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

1 code implementation CVPR 2020 Xinshuo Weng, Yongxin Wang, Yunze Man, Kris M. Kitani

As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.

3D Multi-Object Tracking

Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

no code implementations18 Mar 2020 Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, Nicholas Rhinehart

Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data.

Decision Making Future prediction +1

PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling

no code implementations17 Mar 2020 Xinshuo Weng, Ye Yuan, Kris Kitani

We evaluate on KITTI and nuScenes datasets showing that our method with socially-aware feature learning and diversity sampling achieves new state-of-the-art performance on 3D MOT and trajectory prediction.

3D Multi-Object Tracking Trajectory Forecasting

Learning Shape Representations for Clothing Variations in Person Re-Identification

no code implementations16 Mar 2020 Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani

To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.

Disentanglement Person Re-Identification

3D Multi-Object Tracking: A Baseline and New Evaluation Metrics

1 code implementation9 Jul 2019 Xinshuo Weng, Jianren Wang, David Held, Kris Kitani

Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.

3D Multi-Object Tracking Autonomous Driving +1

Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading

no code implementations4 May 2019 Xinshuo Weng, Kris Kitani

We evaluate different combinations of front-end and back-end modules with the grayscale video and optical flow inputs on the LRW dataset.

General Classification Lipreading +1

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

1 code implementation23 Mar 2019 Xinshuo Weng, Kris Kitani

Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal.

Monocular 3D Object Detection Monocular Depth Estimation +2

On the Importance of Video Action Recognition for Visual Lipreading

no code implementations22 Mar 2019 Xinshuo Weng

We focus on the word-level visual lipreading, which requires to decode the word from the speaker's video.

Action Recognition Lipreading +1

Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges

1 code implementation21 Mar 2019 Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani

Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision.

Robotics

Deep Reinforcement Learning for Autonomous Driving

1 code implementation28 Nov 2018 Sen Wang, Daoyuan Jia, Xinshuo Weng

To deal with these challenges, we first adopt the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain.

Autonomous Driving reinforcement-learning

Image Labeling with Markov Random Fields and Conditional Random Fields

no code implementations28 Nov 2018 Shangxuan Wu, Xinshuo Weng

Most existing methods for object segmentation in computer vision are formulated as a labeling task.

Computer Vision Semantic Segmentation

CyLKs: Unsupervised Cycle Lucas-Kanade Network for Landmark Tracking

no code implementations28 Nov 2018 Xinshuo Weng, Wentao Han

Across a majority of modern learning-based tracking systems, expensive annotations are needed to achieve state-of-the-art performance.

Landmark Tracking

Rotational Rectification Network: Enabling Pedestrian Detection for Mobile Vision

no code implementations19 Jun 2017 Xinshuo Weng, Shangxuan Wu, Fares Beainy, Kris Kitani

To address this issue, we propose a Rotational Rectification Network (R2N) that can be inserted into any CNN-based pedestrian (or object) detector to adapt it to significant changes in camera rotation.

Pedestrian Detection

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

no code implementations15 Dec 2016 Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade

We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations.

Human Detection Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.