no code implementations • 16 Jun 2025 • Yuqing Wen, Kefan Gu, Haoxuan Liu, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiaoyan Sun
Vision-Language-Action (VLA) models have recently made significant advance in multi-task, end-to-end robotic control, due to the strong generalization capabilities of Vision-Language Models (VLMs).
no code implementations • 5 Jun 2025 • Yani Zhang, Dongming Wu, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong
In this study, we explore a fundamental question: Does embodied 3D grounding benefit enough from detection?
no code implementations • 8 May 2025 • Genghua Kou, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Ziheng Zhang, Osamu Yoshie, Tiancai Wang, Ying Li, Xiangyu Zhang
In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD).
no code implementations • 2 Apr 2025 • Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang
The latter aims to aggregate information from all available views to recover Bird's-Eye-View images, contributing to a comprehensive overview of the entire scene.
no code implementations • 16 Feb 2025 • Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie
Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view.
no code implementations • 11 Dec 2024 • Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Yoshie Osamu, Ping Tan, Hongan Wang, Xiaoming Deng
Multi-hand semantic grasp generation aims to generate feasible and semantically appropriate grasp poses for different robotic hands based on natural language instructions.
no code implementations • CVPR 2025 • Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin
UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.
1 code implementation • 29 Nov 2024 • Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Haosheng Li, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie
To address this, we propose RoboMatrix, a skill-centric hierarchical framework designed for scalable robot task planning and execution in open-world environments.
no code implementations • 18 Nov 2024 • Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang
To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine.
1 code implementation • 4 Nov 2024 • Meng Cao, Yuyang Liu, Yingfei Liu, Tiancai Wang, Jiahua Dong, Henghui Ding, Xiangyu Zhang, Ian Reid, Xiaodan Liang
In terms of methodology, we propose Continual LLaVA, a rehearsal-free method tailored for continual instruction tuning in LVLMs.
1 code implementation • 12 Oct 2024 • Haochen Wang, Anlin Zheng, Yucheng Zhao, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Zhaoxiang Zhang
This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
1 code implementation • 14 Aug 2024 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
The field of autonomous driving increasingly demands high-quality annotated video training data.
no code implementations • 28 May 2024 • Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang
Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving.
no code implementations • 14 May 2024 • Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang, Zeliang Ma, Dengyi Ji, Haiwen Li, Xingliang Huang, Yu Tian, Genghua Kou, Fan Jia, Yingfei Liu, Tiancai Wang, Ying Li, Xiaoshuai Hao, Yifan Yang, HUI ZHANG, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang, Jinke Li, Xiao He, Xiaoqiang Cheng, Bingyang Zhang, Lirong Zhao, Dianlei Ding, Fangsheng Liu, Yixiang Yan, Hongming Wang, Nanfei Ye, Lun Luo, Yubo Tian, Yiwei Zuo, Zhe Cao, Yi Ren, Yunfan Li, Wenjie Liu, Xun Wu, Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Cunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu, Ziyan Wang, Chiwei Li, Shilong Li, Chendong Yuan, Songyue Yang, Wentao Liu, Peng Chen, Bin Zhou, YuBo Wang, Chi Zhang, Jianhang Sun, Hai Chen, Xiao Yang, Lizhong Wang, Dongyi Fu, Yongchun Lin, Huitong Yang, Haoang Li, Yadan Luo, Xianjing Cheng, Yong Xu
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles.
no code implementations • 28 Mar 2024 • Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang
Autonomous driving progress relies on large-scale annotated datasets.
no code implementations • 17 Jan 2024 • Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao
This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.
1 code implementation • 21 Dec 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang
To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
2 code implementations • 6 Dec 2023 • Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao
With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.
no code implementations • 30 Nov 2023 • En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao
Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.
Ranked #164 on
Visual Question Answering
on MM-Vet
no code implementations • 29 Nov 2023 • Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, Osamu Yoshie
Pillar-based methods mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain.
1 code implementation • CVPR 2024 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.
no code implementations • 22 Nov 2023 • Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang
Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.
no code implementations • 20 Nov 2023 • Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang
Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent.
1 code implementation • 10 Oct 2023 • Dongming Wu, Jiahao Chang, Fan Jia, Yingfei Liu, Tiancai Wang, Jianbing Shen
Further, we propose TopoMLP, a simple yet high-performance pipeline for driving topology reasoning.
Ranked #5 on
3D Lane Detection
on OpenLane-V2 val
1 code implementation • 8 Sep 2023 • Dongming Wu, Wencheng Han, Yingfei Liu, Tiancai Wang, Cheng-Zhong Xu, Xiangyu Zhang, Jianbing Shen
Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack.
1 code implementation • 18 Aug 2023 • Xiaohui Jiang, Shuailin Li, Yingfei Liu, Shihao Wang, Fan Jia, Tiancai Wang, Lijin Han, Xiangyu Zhang
Recently 3D object detection from surround-view images has made notable advancements with its low deployment cost.
Ranked #1 on
3D Object Detection
on nuScenes Camera Only
1 code implementation • ICCV 2023 • Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen
Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.
Referring Expression Segmentation
Referring Video Object Segmentation
+2
1 code implementation • 16 Jun 2023 • Dongming Wu, Fan Jia, Jiahao Chang, Zhuoling Li, Jianjian Sun, Chunrui Han, Shuailin Li, Yingfei Liu, Zheng Ge, Tiancai Wang
We present the 1st-place solution of OpenLane Topology in Autonomous Driving Challenge.
no code implementations • 23 May 2023 • En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao
Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics.
2 code implementations • ICCV 2023 • Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang
On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67. 6% NDS & 65. 3% AMOTA) with lidar-based methods.
Ranked #1 on
3D Multi-Object Tracking
on nuScenes Camera Only
1 code implementation • CVPR 2023 • Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen
In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT).
2 code implementations • ICCV 2023 • Junjie Yan, Yingfei Liu, Jianjian Sun, Fan Jia, Shuailin Li, Tiancai Wang, Xiangyu Zhang
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection.
4 code implementations • CVPR 2023 • Yuang Zhang, Tiancai Wang, Xiangyu Zhang
In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector.
Ranked #3 on
Multi-Object Tracking
on DanceTrack
(using extra training data)
Multi-Object Tracking
Multiple Object Tracking with Transformer
+2
no code implementations • 15 Nov 2022 • Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang
We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.
3 code implementations • 27 Oct 2022 • Yuang Zhang, Tiancai Wang, Weiyao Lin, Xiangyu Zhang
We present our 1st place solution to the Group Dance Multiple People Tracking Challenge.
Multi-Object Tracking
Multiple Object Tracking with Transformer
+1
1 code implementation • ICCV 2023 • Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, Xiangyu Zhang, Jian Sun
More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling.
Ranked #2 on
Bird's-Eye View Semantic Segmentation
on nuScenes
(IoU lane - 224x480 - 100x100 at 0.5 metric)
1 code implementation • CVPR 2022 • Zhiyuan Liang, Tiancai Wang, Xiangyu Zhang, Jian Sun, Jianbing Shen
The tree energy loss is effective and easy to be incorporated into existing frameworks by combining it with a traditional segmentation loss.
1 code implementation • 10 Mar 2022 • Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun
Object query can perceive the 3D position-aware features and perform end-to-end object detection.
Ranked #3 on
3D Object Detection
on TruckScenes
1 code implementation • 9 Dec 2021 • Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang
Our IFR enjoys several advantages: 1) simulates an infinite-depth refinement network while only requiring parameters of single residual block; 2) produces high-level equilibrium instance features of global receptive field; 3) serves as a plug-and-play general module easily extended to most object recognition frameworks.
1 code implementation • NeurIPS 2021 • Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei
Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR.
Ranked #5 on
Object Detection
on COCO minival
(AP75 metric)
2 code implementations • 7 May 2021 • Fangao Zeng, Bin Dong, Yuang Zhang, Tiancai Wang, Xiangyu Zhang, Yichen Wei
Temporal modeling of objects is a key challenge in multiple object tracking (MOT).
Ranked #1 on
Multi-Object Tracking
on MOT17
(e2e-MOT metric)
Multi-Object Tracking
Multiple Object Tracking with Transformer
+1
no code implementations • 25 Dec 2020 • Tiancai Wang, Xiangyu Zhang, Jian Sun
In this paper, we present an implicit feature pyramid network (i-FPN) for object detection.
1 code implementation • 3 Dec 2020 • Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang
Object detectors usually achieve promising results with the supervision of complete instance annotations.
1 code implementation • 13 Aug 2020 • Jialian Wu, Liangchen Song, Tiancai Wang, Qian Zhang, Junsong Yuan
In the classification tree, as the number of parent class nodes are significantly less, their logits are less noisy and can be utilized to suppress the wrong/noisy logits existed in the fine-grained class nodes.
Ranked #5 on
Few-Shot Object Detection
on LVIS v1.0 val
1 code implementation • CVPR 2020 • Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun
Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them.
no code implementations • ICCV 2019 • Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen
Our approach outperforms the state-of-the-art on all datasets.
1 code implementation • ICCV 2019 • Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao
We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch.