Search Results for author: Xiaoyu Tian

Found 7 papers, 3 papers with code

Unsupervised Learning of 3D Scene Flow from Monocular Camera

1 code implementation8 Jun 2022 Guangming Wang, Xiaoyu Tian, Ruiqi Ding, Hesheng Wang

Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions.

Depth Estimation Optical Flow Estimation +2

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

no code implementations9 Aug 2022 Xin Huang, Xiaoyu Tian, Junru Gu, Qiao Sun, Hang Zhao

Recently, the occupancy flow fields representation was proposed to represent joint future states of road agents through a combination of occupancy grid and flow, which supports efficient and consistent joint predictions.

Autonomous Driving

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

1 code implementation NeurIPS 2023 Xiaoyu Tian, Tao Jiang, Longfei Yun, Yucheng Mao, Huitong Yang, Yue Wang, Yilun Wang, Hang Zhao

3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations.

Autonomous Driving

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

1 code implementation CVPR 2023 Xiaoyu Tian, Haoxi Ran, Yue Wang, Hang Zhao

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations?

Multi-Object Tracking object-detection +3

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

no code implementations27 Oct 2023 Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response.

Response Generation

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

no code implementations5 Jan 2024 Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents.

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

no code implementations19 Feb 2024 Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities.

Autonomous Driving Scene Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.