no code implementations • 30 Sep 2024 • Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cairong Zhao
We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains.
1 code implementation • 24 Sep 2024 • Lingyu Xiao, Jiang-Jiang Liu, Sen yang, Xiaofan Li, Xiaoqing Ye, Wankou Yang, Jingdong Wang
In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses.
1 code implementation • 1 Sep 2024 • Dingyuan Zhang, Dingkang Liang, Zichang Tan, Xiaoqing Ye, Cheng Zhang, Jingdong Wang, Xiang Bai
Slow inference speed is one of the most crucial concerns for deploying multi-view 3D detectors to tasks with high real-time requirements like autonomous driving.
1 code implementation • 25 Jul 2024 • Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai
To tackle this problem, we simply introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features rather than blindly increasing the number of scanning orders for voxel features.
Ranked #1 on
3D Object Detection
on Waymo Open Dataset
1 code implementation • 22 Jul 2024 • Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang
This module adjusts the feature distributions from both the camera and LiDAR, bringing them closer to the ground truth domain and minimizing differences.
1 code implementation • 15 Jul 2024 • Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai
We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution of point clouds; 2) How to implement an effective query interaction by exploiting the rich geometric structure of point clouds is not fully explored.
1 code implementation • 15 Jul 2024 • Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai
Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection.
1 code implementation • 9 Jul 2024 • Jiankun Li, Hao Li, JiangJiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang
Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions.
1 code implementation • 8 Jul 2024 • Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang
The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model.
1 code implementation • 1 Jul 2024 • Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, Xiang Bai
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC).
1 code implementation • CVPR 2024 • Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range.
1 code implementation • 22 Apr 2024 • Guibiao Liao, Jiankun Li, Zhenyu Bao, Xiaoqing Ye, Jingdong Wang, Qing Li, Kanglin Liu
Additionally, to address the semantic ambiguity, caused by utilizing view-inconsistent 2D CLIP semantics to supervise Gaussians, we introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
1 code implementation • 16 Feb 2024 • Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai
Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
no code implementations • 11 Oct 2023 • Xiaofan Li, Yifu Zhang, Xiaoqing Ye
To alleviate the problem, we propose a spatial-temporal consistent diffusion framework DrivingDiffusion, to generate realistic multi-view videos controlled by 3D layout.
no code implementations • ICCV 2023 • Xiang Guo, Jiadai Sun, Yuchao Dai, GuanYing Chen, Xiaoqing Ye, Xiao Tan, Errui Ding, Yumeng Zhang, Jingdong Wang
This paper proposes a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping.
no code implementations • 5 Sep 2023 • Xin Zhou, Jinghua Hou, Tingting Yao, Dingkang Liang, Zhe Liu, Zhikang Zou, Xiaoqing Ye, Jianwei Cheng, Xiang Bai
3D object detection is an essential task for achieving autonomous driving.
1 code implementation • 4 Jun 2023 • Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai
In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks.
1 code implementation • 17 May 2023 • Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, Jingdong Wang
Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning.
Ranked #1 on
Trajectory Planning
on nuScenes
1 code implementation • 12 May 2023 • Zhe Liu, Xiaoqing Ye, Zhikang Zou, Xinwei He, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai
Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods.
Ranked #48 on
3D Object Detection
on nuScenes
1 code implementation • CVPR 2023 • Wei Hua, Dingkang Liang, Jingyu Li, Xiaolong Liu, Zhikang Zou, Xiaoqing Ye, Xiang Bai
Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years.
2 code implementations • CVPR 2023 • Dingkang Liang, Jiahao Xie, Zhikang Zou, Xiaoqing Ye, Wei Xu, Xiang Bai
To the best of our knowledge, CrowdCLIP is the first to investigate the vision language knowledge to solve the counting problem.
Ranked #1 on
Cross-Part Crowd Counting
on ShanghaiTech B
no code implementations • 27 Mar 2023 • Yifu Zhang, Xinggang Wang, Xiaoqing Ye, Wei zhang, Jincheng Lu, Xiao Tan, Errui Ding, Peize Sun, Jingdong Wang
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories.
1 code implementation • ICCV 2023 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category.
2 code implementations • CVPR 2023 • Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai
In this paper, we address the problem of detecting 3D objects from multi-view images.
Ranked #9 on
3D Object Detection
on nuScenes Camera Only
no code implementations • 4 Jan 2023 • Zhe Liu, Xiaoqing Ye, Xiao Tan, Errui Ding, Xiang Bai
In this paper, we propose a cross-modal distillation method named StereoDistill to narrow the gap between the stereo and LiDAR-based approaches via distilling the stereo detectors from the superior LiDAR model at the response level, which is usually overlooked in 3D object detection distillation.
no code implementations • ICCV 2023 • Dingyuan Zhang, Dingkang Liang, Zhikang Zou, Jingyu Li, Xiaoqing Ye, Zhe Liu, Xiao Tan, Xiang Bai
Advanced 3D object detection methods usually rely on large-scale, elaborately labeled datasets to achieve good performance.
no code implementations • CVPR 2023 • Ruihang Chu, Zhengzhe Liu, Xiaoqing Ye, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia
The key of Cart is to utilize the prediction of object structures to connect visual observations with user commands for effective manipulations.
no code implementations • 11 Oct 2022 • Yue He, Minyue Jiang, Xiaoqing Ye, Liang Du, Zhikang Zou, Wei zhang, Xiao Tan, Errui Ding
In this paper, we target at finding an enhanced feature space where the lane features are distinctive while maintaining a similar distribution of lanes in the wild.
1 code implementation • 8 Oct 2022 • Peizhe Jiang, Wei Yang, Xiaoqing Ye, Xiao Tan, Meng Wu
Monocular depth estimation (MDE) in the self-supervised scenario has emerged as a promising method as it refrains from the requirement of ground truth depth.
7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
no code implementations • 28 Sep 2022 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
no code implementations • 24 Aug 2022 • Liang Du, Xiaoqing Ye, Xiao Tan, Edward Johns, Bo Chen, Errui Ding, xiangyang xue, Jianfeng Feng
A feasible method is investigated to construct conceptual scenes without external datasets.
no code implementations • 12 Jul 2022 • Bo Ju, Zhikang Zou, Xiaoqing Ye, Minyue Jiang, Xiao Tan, Errui Ding, Jingdong Wang
In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference.
no code implementations • 15 Jun 2022 • Xiang Guo, GuanYing Chen, Yuchao Dai, Xiaoqing Ye, Jiadai Sun, Xiao Tan, Errui Ding
The second module contains a density and a color grid to model the geometry and density of the scene.
no code implementations • 16 Apr 2022 • Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.
no code implementations • 25 Mar 2022 • Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, YingYing Li, Guangjie Wang, Xiao Tan, Errui Ding
On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system.
no code implementations • CVPR 2022 • Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, YingYing Li, Guangjie Wang, Xiao Tan, Errui Ding
On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system.
no code implementations • CVPR 2022 • Ruihang Chu, Xiaoqing Ye, Zhengzhe Liu, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia
We explore the way to alleviate the label-hungry problem in a semi-supervised setting for 3D instance segmentation.
no code implementations • ICCV 2021 • Zhikang Zou, Xiaoqing Ye, Liang Du, Xianhui Cheng, Xiao Tan, Li Zhang, Jianfeng Feng, xiangyang xue, Errui Ding
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving, whereas its accuracy is still far from satisfactory.
1 code implementation • 3 Dec 2021 • Zheyuan Zhou, Liang Du, Xiaoqing Ye, Zhikang Zou, Xiao Tan, Li Zhang, xiangyang xue, Jianfeng Feng
Monocular 3D object detection aims to predict the object location, dimension and orientation in 3D space alongside the object category given only a monocular image.
no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye
In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.
1 code implementation • 22 Apr 2021 • Qiming Wu, Zhikang Zou, Pan Zhou, Xiaoqing Ye, Binghui Wang, Ang Li
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
1 code implementation • CVPR 2021 • Li Wang, Liang Du, Xiaoqing Ye, Yanwei Fu, Guodong Guo, xiangyang xue, Jianfeng Feng, Li Zhang
The objective of this paper is to learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
Ranked #14 on
Monocular 3D Object Detection
on KITTI Cars Moderate
no code implementations • ICCV 2021 • Zhi Chen, Xiaoqing Ye, Wei Yang, Zhenbo Xu, Xiao Tan, Zhikang Zou, Errui Ding, Xinming Zhang, Liusheng Huang
Second, we introduce an occlusion-aware distillation (OA Distillation) module, which leverages the predicted depths from StereoNet in non-occluded regions to train our monocular depth estimation network named SingleNet.
no code implementations • CVPR 2020 • Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques.
1 code implementation • 1 Mar 2020 • Zhenbo Xu, Wei zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen, Errui Ding, Ajin Meng, Liusheng Huang
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
no code implementations • ECCV 2018 • Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, Xiaolin Zhang
Semantic segmentation of 3D unstructured point clouds remains an open research problem.