Search Results for author: Tiancai Wang

Found 47 papers, 27 papers with code

ROSA: Harnessing Robot States for Vision-Language and Action Alignment

no code implementations16 Jun 2025 Yuqing Wen, Kefan Gu, Haoxuan Liu, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiaoyan Sun

Vision-Language-Action (VLA) models have recently made significant advance in multi-task, end-to-end robotic control, due to the strong generalization capabilities of Vision-Language Models (VLMs).

State Estimation Vision-Language-Action

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

no code implementations5 Jun 2025 Yani Zhang, Dongming Wu, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong

In this study, we explore a fundamental question: Does embodied 3D grounding benefit enough from detection?

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

no code implementations2 Apr 2025 Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang

The latter aims to aggregate information from all available views to recover Bird's-Eye-View images, contributing to a comprehensive overview of the entire scene.

Scene Understanding

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

no code implementations16 Feb 2025 Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view.

Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

no code implementations11 Dec 2024 Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Yoshie Osamu, Ping Tan, Hongan Wang, Xiaoming Deng

Multi-hand semantic grasp generation aims to generate feasible and semantically appropriate grasp poses for different robotic hands based on natural language instructions.

Grasp Generation

UniScene: Unified Occupancy-centric Driving Scene Generation

no code implementations CVPR 2025 Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin

UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.

Autonomous Driving Scene Generation

RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

1 code implementation29 Nov 2024 Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Haosheng Li, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

To address this, we propose RoboMatrix, a skill-centric hierarchical framework designed for scalable robot task planning and execution in open-world environments.

Scheduling Task Planning +1

RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator

no code implementations18 Nov 2024 Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang

To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine.

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models

1 code implementation4 Nov 2024 Meng Cao, Yuyang Liu, Yingfei Liu, Tiancai Wang, Jiahua Dong, Henghui Ding, Xiangyu Zhang, Ian Reid, Xiaodan Liang

In terms of methodology, we propose Continual LLaVA, a rehearsal-free method tailored for continual instruction tuning in LVLMs.

Reconstructive Visual Instruction Tuning

1 code implementation12 Oct 2024 Haochen Wang, Anlin Zheng, Yucheng Zhao, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Zhaoxiang Zhang

This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.

Denoising

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

no code implementations28 May 2024 Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving.

3D Object Detection Autonomous Driving +4

Stream Query Denoising for Vectorized HD Map Construction

no code implementations17 Jan 2024 Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.

Autonomous Driving Denoising

Bootstrap Masked Visual Modeling via Hard Patches Mining

1 code implementation21 Dec 2023 Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang

To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

2 code implementations6 Dec 2023 Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.

Autonomous Driving

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

no code implementations29 Nov 2023 Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, Osamu Yoshie

Pillar-based methods mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain.

3D Object Detection object-detection

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

1 code implementation CVPR 2024 Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

ADriver-I: A General World Model for Autonomous Driving

no code implementations22 Nov 2023 Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang

Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.

Autonomous Driving

VLM-Eval: A General Evaluation on Video Large Language Models

no code implementations20 Nov 2023 Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang

Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent.

Action Recognition Retrieval

Language Prompt for Autonomous Driving

1 code implementation8 Sep 2023 Dongming Wu, Wencheng Han, Yingfei Liu, Tiancai Wang, Cheng-Zhong Xu, Xiangyu Zhang, Jianbing Shen

Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack.

Autonomous Driving Object

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking

no code implementations23 May 2023 En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao

Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics.

Denoising Multi-Object Tracking +1

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

2 code implementations ICCV 2023 Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang

On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67. 6% NDS & 65. 3% AMOTA) with lidar-based methods.

3D Multi-Object Tracking 3D Object Detection +2

Referring Multi-Object Tracking

1 code implementation CVPR 2023 Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen

In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT).

Object Referring Multi-Object Tracking

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

4 code implementations CVPR 2023 Yuang Zhang, Tiancai Wang, Xiangyu Zhang

In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector.

Ranked #3 on Multi-Object Tracking on DanceTrack (using extra training data)

Multi-Object Tracking Multiple Object Tracking with Transformer +2

Towards 3D Object Detection with 2D Supervision

no code implementations15 Nov 2022 Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

1 code implementation CVPR 2022 Zhiyuan Liang, Tiancai Wang, Xiangyu Zhang, Jian Sun, Jianbing Shen

The tree energy loss is effective and easy to be incorporated into existing frameworks by combining it with a traditional segmentation loss.

Segmentation Semantic Segmentation

Implicit Feature Refinement for Instance Segmentation

1 code implementation9 Dec 2021 Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang

Our IFR enjoys several advantages: 1) simulates an infinite-depth refinement network while only requiring parameters of single residual block; 2) produces high-level equilibrium instance features of global receptive field; 3) serves as a plug-and-play general module easily extended to most object recognition frameworks.

Instance Segmentation Object Recognition +3

SOLQ: Segmenting Objects by Learning Queries

1 code implementation NeurIPS 2021 Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei

Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR.

Ranked #5 on Object Detection on COCO minival (AP75 metric)

Instance Segmentation Object Detection +2

Implicit Feature Pyramid Network for Object Detection

no code implementations25 Dec 2020 Tiancai Wang, Xiangyu Zhang, Jian Sun

In this paper, we present an implicit feature pyramid network (i-FPN) for object detection.

Object object-detection +1

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

1 code implementation3 Dec 2020 Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang

Object detectors usually achieve promising results with the supervision of complete instance annotations.

MULTI-VIEW LEARNING Object +4

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

1 code implementation13 Aug 2020 Jialian Wu, Liangchen Song, Tiancai Wang, Qian Zhang, Junsong Yuan

In the classification tree, as the number of parent class nodes are significantly less, their logits are less noisy and can be utilized to suppress the wrong/noisy logits existed in the fine-grained class nodes.

Classification Few-Shot Object Detection +7

Learning Human-Object Interaction Detection using Interaction Points

1 code implementation CVPR 2020 Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun

Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them.

Human-Object Interaction Detection Keypoint Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.