Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D poses in cluttered scenes, presenting significant challenges for model generalizability.
We apply the proposed VI-Net to the challenging task of category-level 6D object pose estimation for predicting the poses of unknown objects without available CAD models; experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
Domain gap between synthetic and real data in visual regression (e. g. 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning.
We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction.
Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses.
It is difficult to precisely annotate object instances and their semantics in 3D space, and as such, synthetic data are extensively used for these tasks, e. g., category-level 6D object pose and size estimation.
In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method.
In this paper, we propose a novel design of Sparse Steerable Convolution (SS-Conv) to address the shortcoming; SS-Conv greatly accelerates steerable convolution with sparse tensors, while strictly preserving the property of SE(3)-equivariance.
DualPoseNet stacks two parallel pose decoders on top of a shared pose encoder, where the implicit decoder predicts object poses with a working mechanism different from that of the explicit one; they thus impose complementary supervision on the training of pose encoder.
Ranked #4 on 6D Pose Estimation using RGBD on REAL275
Point set is arguably the most direct approximation of an object or scene surface, yet its practical acquisition often suffers from the shortcoming of being noisy, sparse, and possibly incomplete, which restricts its use for a high-quality surface recovery.
Regularizing the targeted attack loss with our proposed geometry-aware objectives results in our proposed method, Geometry-Aware Adversarial Attack ($GeoA^3$).
Such a perspective enables us to study deep multi-view learning in the context of regularized network training, for which we present control experiments of benchmark image classification to show the efficacy of our proposed CorrReg.