CodeSLAM represents the 3D geometry of a scene using the latent space of a variational autoencoder. The depth thus becomes a function of the RGB image and the unknown code, $D = G_\theta(I,c)$. During training time, the weights of the network $G_\theta$ are learnt by training the generator and encoder using a standard autoencoding task. At test time the code $c$ and the pose of the images is found by optimizing the reprojection error over multiple images.
Source: CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAMPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
3D Reconstruction | 1 | 33.33% |
Depth Estimation | 1 | 33.33% |
Scene Understanding | 1 | 33.33% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |