In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.
To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
Unsupervised pre-training aims at learning transferable features that are beneficial for downstream tasks.
Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.
1 code implementation • 3 Nov 2020 • Bochao Wang, Hang Xu, Jiajin Zhang, Chen Chen, Xiaozhi Fang, Yixing Xu, Ning Kang, Lanqing Hong, Chenhan Jiang, Xinyue Cai, Jiawei Li, Fengwei Zhou, Yong Li, Zhicheng Liu, Xinghao Chen, Kai Han, Han Shu, Dehua Song, Yunhe Wang, Wei zhang, Chunjing Xu, Zhenguo Li, Wenzhi Liu, Tong Zhang
Automated Machine Learning (AutoML) is an important industrial solution for automatic discovery and deployment of the machine learning models.
Advanced object detectors usually adopt a backbone network designed and pretrained by ImageNet classification.
Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?
Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.
Driven by recent computer vision and robotic applications, recovering 3D human poses has become increasingly important and attracted growing interests.
Ranked #206 on 3D Human Pose Estimation on Human3.6M
The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene.