We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
We evaluate our model in several I2I translation benchmarks, and the results show that the proposed model has advantages over previous methods in both strongly constrained and normally constrained tasks.
Ranked #1 on Style Transfer on WikiArt
Based on PRSlot modules, we present a novel Pyramid Region-based Slot Attention Network termed PRSA-Net to learn a unified visual representation with rich temporal and semantic context for better proposal generation.
Our proposed method mainly leverages the intra-modality encoding and cross-modality co-occurrence encoding for fully representation modeling.
More importantly, our DPK makes the performance of the student model is positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
A key feature of the multi-label setting is that images often have multiple labels, which typically refer to different regions of the image.
It captures spatial-temporal contextual information jointly to augment the individual and group representations effectively with a clustered spatial-temporal transformer.
In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Ranked #5 on Monocular 3D Object Detection on KITTI Cars Moderate
Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos.
The proposed cloud-based dynamic programming and rule extraction framework with the passenger load prediction shows 4% and 11% fewer bus operating costs in off-peak and peak hours, respectively.
This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods.
A novel correction algorithm is proposed for multi-class classification problems with corrupted training data.
Extensive experiments on standard benchmarks demonstrate that our end-to-end model achieves a new state-of-the-art for regular and irregular scene text recognition and needs 6 times shorter inference time than attentionbased methods.
Dual-energy (DE) chest radiographs provide greater diagnostic information than standard radiographs by separating the image into bone and soft tissue, revealing suspicious lesions which may otherwise be obstructed from view.