As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i. e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices.
To alleviate the problem, we propose a spatial-temporal consistent diffusion framework DrivingDiffusion, to generate realistic multi-view videos controlled by 3D layout.
Targets in urban traffic scenes often undergo occlusion, illumination changes, and perspective changes, making it difficult to associate targets across different cameras accurately.
Transfer learning provides the possibility to solve this problem, but there are too many features in natural images that are not related to the target domain.
In order to ensure that the data from the target domain in different sub-networks in the same batch is exactly the same, we designed a multi-source domain independent strategy to provide the possibility for later local feature fusion to complete the key features required.
Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning.
Ranked #1 on Trajectory Planning on nuScenes
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories.
To address the problem, we present an efficient approach to compute a marginal probability for each pair of objects in real time.
Given point clouds of the source and target scenes, we propose a three-step PlaneSDF-based change detection approach: (1) PlaneSDF volumes are instantiated within each scene and registered across scenes using plane poses; 2D height maps and object maps are extracted per volume via height projection and connected component analysis.
ByteTrack also achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks.
Ranked #1 on Multiple Object Tracking on BDD100K val
We estimate 3D poses from the voxel representation by predicting whether each voxel contains a particular body joint.
Ranked #7 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)
Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency.
Ranked #1 on Multi-Object Tracking on 2DMOT15 (using extra training data)