We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images.
Ranked #1 on 3D Reconstruction on DTU
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial.
In this work, we introduce a general method for 3D self-supervised representation learning that 1) remains agnostic to the underlying neural network architecture, and 2) specifically leverages the geometric nature of 3D point cloud data.
We achieve this by jointly optimizing the parameters of two neural radiance fields and a set of rigid poses which align the two fields at each frame.
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics.
Recently, neural networks operating on point clouds have shown superior performance on 3D understanding tasks such as shape classification and part segmentation.
Shape completion, the problem of estimating the complete geometry of objects from partial observations, lies at the core of many vision and robotics applications.
Ranked #3 on Point Cloud Completion on ShapeNet