no code implementations • 9 Sep 2024 • Shuhan Tan, Boris Ivanovic, Yuxiao Chen, Boyi Li, Xinshuo Weng, Yulong Cao, Philipp Krähenbühl, Marco Pavone
Simulation stands as a cornerstone for safe and efficient autonomous driving development.
no code implementations • 26 Jul 2024 • Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone
Finally, we establish a benchmark for video captioning and introduce a leaderboard, aiming to accelerate advancements in video understanding, captioning, and data alignment.
no code implementations • 1 Jul 2024 • Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone
The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design.
1 code implementation • 21 Jun 2024 • Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta
On a large set of challenging scenarios, we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD.
no code implementations • 6 May 2024 • Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone
Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21. 3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17. 7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively.
no code implementations • CVPR 2024 • Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, Marco Pavone
Recent works have proposed end-to-end autonomous vehicle (AV) architectures comprised of differentiable modules achieving state-of-the-art driving performance.
1 code implementation • 7 Nov 2023 • Katie Z Luo, Xinshuo Weng, Yan Wang, Shuang Wu, Jie Li, Kilian Q Weinberger, Yue Wang, Marco Pavone
We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Encoder Representations from transFormers, to leverage priors in SD maps for the lane-topology prediction task.
1 code implementation • 3 Nov 2023 • Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
1 code implementation • 16 Jul 2023 • Shuhan Tan, Boris Ivanovic, Xinshuo Weng, Marco Pavone, Philipp Kraehenbuehl
In this work, we turn to language as a source of supervision for dynamic traffic scene generation.
no code implementations • 29 Jul 2022 • Yulong Cao, Danfei Xu, Xinshuo Weng, Zhuoqing Mao, Anima Anandkumar, Chaowei Xiao, Marco Pavone
We demonstrate that our method is able to improve the performance by 46% on adversarial data and at the cost of only 3% performance degradation on clean data, compared to the model trained with clean data.
1 code implementation • 22 Jul 2022 • Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart, Autumn Trimble, Xinshuo Weng, David Whitewolf, Chenglei Wu, Shoou-I Yu, Yaser Sheikh
Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions.
7 code implementations • CVPR 2023 • Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani
Instead of relying only on the linear state estimate (i. e., estimation-centric approach), we use object observations (i. e., the measurements by object detector) to compute a virtual trajectory over the occlusion period to fix the error accumulation of filter parameters during the occlusion period.
Ranked #2 on Multiple Object Tracking on CroHD
no code implementations • CVPR 2022 • Xinshuo Weng, Boris Ivanovic, Kris Kitani, Marco Pavone
This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches.
1 code implementation • 18 Oct 2021 • Xinshuo Weng, Boris Ivanovic, Marco Pavone
Recently, there has been tremendous progress in developing each individual module of the standard perception-planning robot autonomy pipeline, including detection, tracking, prediction of other agents' trajectories, and ego-agent trajectory planning.
no code implementations • ICCV 2021 • Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani
LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection.
1 code implementation • 8 Jul 2021 • Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani
To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes.
no code implementations • CVPR 2021 • Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani
We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios, e. g., cameras for construction sites, sports stadiums, and public spaces.
2 code implementations • ICCV 2021 • Ye Yuan, Xinshuo Weng, Yanglan Ou, Kris Kitani
Instead, we would prefer a method that allows an agent's state at one time to directly affect another agent's state at a future time.
Ranked #10 on Trajectory Prediction on ETH/UCY
1 code implementation • 25 Jan 2021 • Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu
End-to-end training is made possible by differentiable registration and 3D triangulation modules.
no code implementations • ICCV 2021 • Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani
We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.
no code implementations • 10 Dec 2020 • Xinshuo Weng, Kris Kitani
Also, this threshold is sensitive to many factors such as target object category so we need to re-search the threshold if these factors change.
no code implementations • 25 Aug 2020 • Xinshuo Weng, Ye Yuan, Kris Kitani
To evaluate this hypothesis, we propose a unified solution for 3D MOT and trajectory forecasting which also incorporates two additional novel computational units.
no code implementations • 20 Aug 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
3D Multi-object tracking (MOT) is crucial to autonomous systems.
no code implementations • 18 Aug 2020 • Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
1 code implementation • 23 Jun 2020 • Yongxin Wang, Kris Kitani, Xinshuo Weng
Despite the fact that the two components are dependent on each other, prior works often design detection and data association modules separately which are trained with separate objectives.
Ranked #1 on Multi-Object Tracking on 2D MOT 2015
no code implementations • 17 Jun 2020 • Xi Sun, Xinshuo Weng, Kris Kitani
We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU).
1 code implementation • 12 Jun 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
1 code implementation • CVPR 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris M. Kitani
As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
no code implementations • 18 Mar 2020 • Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, Nicholas Rhinehart
Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data.
no code implementations • 17 Mar 2020 • Xinshuo Weng, Ye Yuan, Kris Kitani
We evaluate on KITTI and nuScenes datasets showing that our method with socially-aware feature learning and diversity sampling achieves new state-of-the-art performance on 3D MOT and trajectory prediction.
no code implementations • 16 Mar 2020 • Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani
To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.
1 code implementation • 9 Jul 2019 • Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
Ranked #16 on Multiple Object Tracking on KITTI Test (Online Methods)
no code implementations • 4 May 2019 • Xinshuo Weng, Kris Kitani
We evaluate different combinations of front-end and back-end modules with the grayscale video and optical flow inputs on the LRW dataset.
1 code implementation • 23 Mar 2019 • Xinshuo Weng, Kris Kitani
Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal.
no code implementations • 22 Mar 2019 • Xinshuo Weng
We focus on the word-level visual lipreading, which requires to decode the word from the speaker's video.
1 code implementation • 21 Mar 2019 • Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani
Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision.
Robotics
1 code implementation • 28 Nov 2018 • Sen Wang, Daoyuan Jia, Xinshuo Weng
To deal with these challenges, we first adopt the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain.
no code implementations • 28 Nov 2018 • Xinshuo Weng, Wentao Han
Across a majority of modern learning-based tracking systems, expensive annotations are needed to achieve state-of-the-art performance.
no code implementations • 28 Nov 2018 • Shangxuan Wu, Xinshuo Weng
Most existing methods for object segmentation in computer vision are formulated as a labeling task.
no code implementations • 17 Nov 2018 • Yunze Man, Xinshuo Weng, Xi Li, Kris Kitani
We focus on estimating the 3D orientation of the ground plane from a single image.
1 code implementation • CVPR 2018 • Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.
Ranked #1 on Facial Landmark Detection on 300-VW (C)
no code implementations • 19 Jun 2017 • Xinshuo Weng, Shangxuan Wu, Fares Beainy, Kris Kitani
To address this issue, we propose a Rotational Rectification Network (R2N) that can be inserted into any CNN-based pedestrian (or object) detector to adapt it to significant changes in camera rotation.
no code implementations • 15 Dec 2016 • Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade
We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations.